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UhmiXLB TOR TBB FRODUCTIOV OF 
MMWHCTBR ffTRffCTPRM MP ffgg laffiBPr 

flBtP or Tire IBYBHTIOT 
g The present invention pertains to nanostructures, 

i.e., nanometer sized structures useful in the construction 
of microscopic and macroscopic structures. In particular, 
the present invention pertains to nanostructures based on 
bacteriophage T4 tail fiber proteins and variants thereof. 

10 

MiCTORpPHp TP TUB iimnfTioy 

While the strength of most metallic and ceramic 
based materials derives from the theoretical bonding 
strengths between their component molecules and crystallite 
surfaces, it is significantly limited by flaws in their 
crystal or glass-like structures. These flaws are usually 
inherent in the raw materials themselves or developed during 
fabrication and are often expanded due to e3q>osure to 
environmental stresses. 

20 The emerging field of nanotechnology has made the 

limitations of traditional materials more critical. The 
ability to design and produce very small structures (i.e., of 
nanometer dimensions) that can serve complex functions 
depends upon the use of appropriate materials that can be 

2^ manipulated in predictable and reproducible ways, and that 
have the properties required for each novel application. 

Biological systems serve as a paradigm for 
sophisticated nanostructures. Living cells fabricate proteins 
and combine thc« into strucrtures that are perfectly formed 

^0 and can resist damage in their normal environment. In some 
cases, intricate structures are created by a process of 
self-assembly, the instructions for which are built into the 
component polypeptides. Finally, proteins are subject to 
proofreading processes that insure a high degree of quality 
control. 

Therefore, there is a need in the art for methods 
and compositions that exploit these unique features of 
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proteina to form constituents of synthetic nanostructures. 
The need is to design naterials whose properties can be 
tailored to suit the particular requirements of 
nanoneter^scale technology. Moreover, since the subunits of 
5 most macrostructural materials, ceramics, metals, fibers, 
etc, are based on the bonding of nanostructural subunits, 
the fabrication of appropriate subunits without flaws and of 
exact dimensions and uniformity should im>rove the strength 
and consistency of the macrostructures because the surfaces 
10 are more regular and can interact more closely over an 
extended area than larger, more heterogeneous material* 

BTOMMT Qf m umsmsm 

In one aspect, the present invention provides 
15 isolated protein building blocks for nanostructures, 

comprising modified tail fiber proteins of bacteriophage T4. 
The gp34, 36, and 37 proteins are modified in various ways to 
form novel rod structures with different properties. 
Specific internal peptide sequences may be deleted without 
20 affecting their ability to form dlmers and associate with 
their natural tail fiber partners. Alternatively, they may 
be modified so that they: interact only with other modified, 
and not native, tail fiber partners; exhibit thermolabile 
interactions with their partners; or contain additional 

29 functional groups that enable them to interact with 
heterologous binding moieties. 

The present invention also encompasses fusion 
proteins that contain sequences from two or more different 
tail fiber proteins* The gp35 protein, which forms an angle 

30 joint, is modified so as to form average angles different 
from the natural average angle of 137* (±7*) or 156** (±12*'), 
and to exhibit thermolabile interactions with its partners. 

In another aspect, the present invention provides 
nanostructures comprising native and modified tail fiber 
35 proteins of bacteriophage T4. The nanostructures may be one- 
dimensional rods, two-*dimensional polygons or open or closed 
sheets, or three-dimensional open cages or closed solids. 
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BMgy PKBCRIPTIOM OP THE DRAWINGS 

Figures lA and IB show a schemat:ic representation 
of the T4 bacteriophage particle (Figure lA) , and a schematic 
representation of the T4 bacteriophage tail fiber (Figure 
5 IB). 

Figure 2 shows a scheaatic representation of a unit 

rod. 

Figures 3A-3D show scheaatic r^resentations of: a 
one-diBsnsional »ulti-unit rod joined along the x axis 
10 (Figure 3A) ; closed sinqple sheets (Figure 3B) ; closed 
bricOcworlc sheets (Figure 3C) ; and open brickwork sheets 
(Figure 3D) • 

Figure 4 shows a schematic representation of two 
units used to construct porous and solid sheets (top and 
15 bottom), which, when alternatively layered, produce a multi- 
tiered set of cages as shown. 

Figure 5 shows a schematic representation of an 
angled structure having an angle of 12 0<^. 

Figure 6 shows the Wh sequence (SEQ ID N0:1) of 
20 genes 34, 35, 36, and 37 of bacteriophage T4. 

Figure 7 shows the amino acid sequences (sho%m in 
single-letter codes) of the gene products of genes 34 
(SEQ ID H0:2, ORFX SEQ ID NO:3), 35 (SEQ ID HO:4) , 36 
(SBQ ID HO: 5), and 37 (SEQ ID NO: 6) Of bacteriophage T4. The 
25 amino acid sequences (bottom line of each pair) are aligned 
with the nucleotide sequences (top line of each pair.) It is 
noted that the deduc:ed protein sequence of gene 35 (from HCBI 
database) is not believed to be accurate. 

Figures 8A-SB show a schematiic representation of: 
30 the formation of a P37 dimer initiator from a molecule that 
self-assembles into a dimer (Figure 8A); and the formation of 
a P37 trimer initiator from a molecule that self-assembles 
into a trimer (Figure 8B) . 

Figure 9 shows a schematic representation of the 
35 formation of the polymer (P37-36)n with an initiator that is 
a self -assembling dimer. 
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PmTtlP PWCRIFTTQlt QT M MgangMi 

All patents, patcmt applications and literature 
references cited in the specification are hereby incorporated 
by reference in their entirety. In the case of 
5 inconsistencies, the present disclosure, including, 
definitions, vill prevail. 

Although the invention is described in terms of 
bacteriophage T4 tail fiber proteins, it vill be understood 
that the invention is also applicable to tail fiber proteins 
10 of other T*even-like phage, e.g., of the T4 fanily (e.g., T4, 
Tula, Tulb), and T2 fanily (T2, T6, K3, Ox2, Ml, etc.) 

BBEZMIZXSim: 

"Nanostructures** are defined herein as structures 
15 of different sizes and shapes that are assembled from 
nanometer- sized protein coBq;>onents« 

"^Chimers** are defined herein as chimeric proteins 
in ^ich at least the amino- and carboxy-*terminal regions are 
derived frc» different original polypeptides, whether the 
20 original polypeptides are naturally occurring or have been 
modified by mutagenesis. 

^^Homodimers" are defined herein as assemblies of 
tvo substantially identical protein subunits that form a 
defined three-dimensional structure. 
25 The designation ''gp*' denotes a monomeric 

polypeptide, irtiile the designation denotes homooligomers. 
P34, P36, and P37 are presumably homcNiimers or homotrimers. 

An isolated polypeptide that "^consists essentially 
of** a specified amino add sequence is defined herein as a 
30 polypeptide having the specified sequence or a polypeptide 
that contains conservative substitutions within that 
sequence. Conservative substitutions, as those of ordinary 
skill in the art would understand, are ones in which an 
-ecidic residue is replaced by an acidic residue, a basic 
35 residue by a basic residue, or a hydrophobic residue by a 
hydrophobic residue. Also encompassed is a polypeptide that 
lades one or more amino acids at either the amino terminus or 
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carboxy terminus, up to a total of five at either terminus, 
when the absence of the particular residues has no 
discernable effect on the structure or the function of the 
polypeptide in practicing the present invention. 
S ^e present invention pertains to a new class of 

protein building blocks whose dimensions are measured in 
nanometers, whi^ are useful in the construction of 
microscopic and macroscopic structures. Without wishing to 
be bound by theory, it is believed that the basic unit is a 

10 homodlmer cos^sed of two identical protein subunits having a 
cross-/? configuration, although a trimeric structure is also 
possible. Thus, as will be apparent, references to a 
"homodimer** or "dimerization** as used herein will in many 
instances be construed as also referring to a homotrimer or 

15 trimerization. These long, stiff, and stable rod-shaped 
units can assemble with other rods using coupling devices 
that can be attached genetically or in vitro. The ends of 
one rod may attach to different ends of other rods or similar 
rods. Variations in the length of the rods, in the angles of 

20 attachment, and in their flexibility characteristics permit 
differently-shaped structures to self -assemble in situ. In 
this manner the units can self-assemble into predeterained 
larger structures of one, two or three dimensions. The 
self-assembly can be staged to form structures of precise 

25 dimensions and uniform strength due to the flawless 

biological manufac:ture of the components. The rods can also 
be modified by genetic and chemical modifications to form 
predetermined specific attachment sites for other chemical 
entities, allowing the formation of complex structures. 

30 An iaportant aspect of the present invention is 

that the protein units can be designed so that they comprise 
rods of different lengths, and can be further modified to 
include features that alter their surface properties in 
predetermined ways and/or influence their ability to join 

35 with other identical or different units. Furthermore, the 
self-assembly capabilities can be expanded by producing - 
chimeric proteins that combine the properties of two 
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different neiabers of this class. This design feature is 
achieved by manipulating the structure of the genes encoding 
these proteins. 

As detailed below, the compositions and methods of 
5 the present invention take advantage of the properties of the 
natural proteins, i.e^, the resulting structures are stiff, 
strong, stable in aqueous media, heat resistant, protease 
resistant, and can be rendered biodegradable. A large 
quantity of units can be fabricated easily in microorganisms. 

10 Furthermore, for ease of automation, large quantities of 
parts and subassemblies can be stored and used as needed. 

The sequences of the protein subunits are based on 
the components of the tail fiber of the T4 bacteriophage of 
coll. It will be understood that the principles and 

15 techniques can be applied to the tail fibers of other T-even 
phages, or other related bacteriophages that have similar 
tail and/or fiber structures. 

The structure of the T4 bacteriophage tail fiber 
(illustrated in Figure l) can be represented schematically as 

20 follovs (N« amino terminus, O* carboxy terminus) : N[P34]C ^ 
M[gp35]C - N[P36]C - N[P37]C. P34, P36, and P37 are all 
stiff, rod-shaped protein homodimers in which tiro identical jS 
sheets, oriented in the same direction, are fused 
face-to^face by hydrophobic interactions between the sheets 

2S juxtaposed with a 180^ rotational axis of symmetry through 
the long axis of the rod. (The structure will vary if P34, 
P36, and P37 are homotrimers.) gp35, by contrast, is a 
monomeric polypeptide that attaches specifically to the 
N-terminus of P36 and then to the C-terminus of P34 and forms 

30 an angle joint between two rods. During T4 infection of E. 
GOli, two gp37 monomers dimerize to form a F37 homodimer; the 
process of dimerization is believed to initiate near the 
C^terminus of F37 and to require two £. coli chaperon 
proteins. (A variant gp37 with a temperature sensitive 

as mutation near the C-terminus used in the present invention 
requires only one chaperon, gp57, for dimerization.) Once 
dimerized, the ll-*terminus of P37 initiates the dimerization 
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designing units that can assemble into a broad variety of 
structures, Bcmm of which are detailed below. 

5 The rods of the present invention function like 

wooden 2X4 studs or steel beams for construction. In this 
case, the surfaces are exactly reproducible at the molecular 
level and thereby fitted for specific attachments to similar 
or different units rods at fixed joining sites. The surfaces 

10 are also modified to be more or less hydrophilic, including 
positively or negatively charged groups, and have protrusions 
built in for specific binding to othcu: units or to an 
intermediate joint with two receptor sites* The surfaces of 
the rod and a schematic of the unit rod are illustrated in 

15 Figure 2. The three dimensions of the rod are defined as: x, 
for the back (E) to front (F) dimension; y, for the down (D) 
to up (U) dimension; and z, for the left (L) to right (R) 
dimension. 

One dimensional multi-^unit rods can be most readily 

20 assembled from single unit rods joined along the x axis 

(Figure 3A) but regular joining of subunits in either of the 
other two dimensions will also form a long structure, but 
with different cross sections than in the x dimension. 

Two dimensional constructs are sheets formed by 

2S interaction of rods along any two axes. 1} Closed simple 
sheets are formed from surfaces which overlap exactly, along 
any two axes (Figure 3B) . 2) closed brickwork sheets are 
formed from Interaction between units that have exactly 
overlapping surfaces in one dimension and a special type of 

30 overlap in the other (Figure 3C). In this case there must be 
two different sets of complementary joints spaced with 
exactly 1/2 unit distance between them. If they are centered 
(i.e., each set 1/4 from the end) then each joint will be in 
the center of the units above and below. If they are offset, 

as then the joint will be offset as well. In this construction, 
the co^;>lementary interacting sites are schematized by * and 
If the interacting sites are each symmetric, the 



PCT/US9S/13023 



alternating rows can interact with the rods in either 
direction. If they are not synmetric, and can only interact 
with interacting rows facing in the sane or opposite 
direction, the sheet will nade of unidirectional rods or 
5 layers of rods in alternating directions. 3) Qnen brickwork 
sheets (or nets) result when the units are separated by more 
than one-*half unit (Figure 3D) • OSie dimensions of the 
C9>enings (or pores) depend upon the distance (dx) separating 
the interacting sites and the distance (dy) by which these 

10 sites separate the surfaces. 

Three dlnensional constructs require sterically 
compatible interactions between all three surfaces to form 
solids. 1) Closed solids can assemble from units that 
overlap exactly in all three dimensions (e.g., the exact 

15 overlapping of closed simple sheets). In an analogous 
manner, closed brickwork sheets can form closed solids by 
overlapping sheets exactly or displaced to bring the 
brickwork into the third dimension. This requires an 
appropriate set of joints on all three pairs of parallel 

20 faces of the unit. 2) Porpus solids are made by joining 
open faricdcwork sheets in various ways. For example, if the 
units overlap exactly in the third dimension, a solid is 
formed with the array of holes of exact dimensions running 
perpendicular to the plane of the paper. If instead, a 

25 material is needed with closed spaces, with layers of width 
dz (i.e., in the U^>D dimension), a simple closed sheet is 
layered on the c^n brickwork sheet to close the openings. 
If the overliqp of the open briclcwork sheet is e.g., 1/4 unit, 
then a rod of length 3/4 units is used to make the sheet. 

30 Joints are then needed in the z dimension. «ie two units 
used to polymerize these alternate layers, and the layers 
thraselves, are schematized in Figure 4. 

All of the above stxructures are composed of simple 
linear rods. A second unit, the angle unit, expands the type 

35 and dimensionality of possible structures. The angle unit 
connects two rods at angles different from 180'', akin to an 
angle iron. The average angle and its degree of rigidity are 



- 9 - 



wo 96/11947 



PCT/US9S/13023 



built into this connector structure. For exanple, the 
structure shown in Figure 5 has an angle of 12 o« and 
different specific joining sites at a and at b. The 
following are examples of structures that are fomed 
S utilising angle joints: 

1) bricKwgrK ghgrta are expanded and 
strengthened in the direction normal to the rod direction by 
adding angles perpendicular to the sheet. In this case, a 
three dimensional network forms. Attachment of 90 * angles to 

10 the ends of the rods makes an angle almost in the plane of 
the sheet, allowing new rods added to those angles (which 
must have some play out of the plane of the original sheet to 
attach in the first place) to form a new sheet, almost 
parallel, with an orientation normal to its upper or lower 

15 neighbor. 

2) Hexagons are made from a mixture of rods and 
angle joints that form 120* angles. In this case, there are 
two exclusive sets of joints. Each set is made up of one of 
the two ends of the rod and one of the two ccmplementary 

20 sites on the angle. This is a linear structure in the sense 
that the hexagon has a direction (either clockwise or 
counterclockwise) . It can be made into a two dimensional 
open net (i.e., a two dimensional honeycomb) by joining the 
sides of the hexagons. It can form hexagonal tubes by 

25 joining the top of the hexagon below to the bottom face of 
the hexagon above. If the tubes also join by their sides, 
they will form an open three dimensional multiple hexagonal 
tube. 

3) Helical hexagonal tubes are made analogously to 
30 heaeagons but the sixth unit is not joined to the first to 

close the hexagon. instMd, the end is displaced from the 
plane of the hexagon and the seventh and further units are 
added to form a hexagonal tube which can be a spring if there 
is little or no adhesive force between the units of the 
35 helix, or a stiff rod if there is such a force to maintain 
the close proximity of apposing units. 
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It will be apparent to one skilled in the art that 
the cm^Msitions and nethods of the present invention also 
encoi^>a88 other polygonal structures sue* as octagons, as 
well as open solids such as tetrahedrons and icosahedrons 
S formed from triangles and boxes formed from squares and 
rectangles. The range of structures is limited only by the 
types of angle units and the substituents that can be 
engineered on the different axes of the rod units. For 
example, other naturally occurring angles are found in the 
1.0 fibers of bacteric^Aiage T7, which has a 90» angle (Steven et 
al., J. Mol. Biol. 200: 352>365, 1988). 

WICT PBQDOCTIOW or m HOP FRQTBIMg 

The protein subunits that are used to construct the 

15 nanostructures of the present invention are based on the four 
polypeptides that comprise the tail fibers of bacteriophage 
T4, i.e., gp34, gp35, gp36 and gp37. The genes encoding 
these proteins have been cloned, and their DNA and protein 
sequences have been determined (for gene 36 and 37 see Oliver 

ao et al. J. Mol. Biol. 153: 545-568, 1981). The DKA and amino 
acid sequences of genes 34, 35, 36 and 37 are set forth in 
Figures 6 and 7 below. 

6p34, gp35, gp36, and gp37 are produced naturally 
following infection of B. coli cells by intact T4 phage 

as particles. Following synthesis in the cytoplasm of the 
bacterial cell, the gp34, 36, and 37 monomers form 
hcxmodimers, which are competent for assembly into maturing 
phage particles. Thus, E. coli serves as an efficient and 
convenient factory for synthesis and diaerization of the 

30 protein subunits described herein below. 

In practicing the present invention, the genes 
encoding the proteins of interest (native, modified, or 
recombined) are incorporated into DMA expression vectors that 
are well known in the art. These circular plasmids typically 

35 contain selectable marker genes (usually conferring 

antibiotic resistance to transformed bacteria) , sequences 
that allow replication of the plasmid to high copy number in 

- 11 - 



wo 96/11947 



PCT/US9S^13023 



coli, and a sultlple cloning site inmediately downstream 
of an inducible prraoter and ri]K>soaie binding site. Examples 
of commercially available vectors suitable for use in the 
present invention include the pET system (Hovagen, Inc., 
5 Madison, WI) and Superlinker vectors pSE280 and pSE380 
(Invitrogen, San Diego, CA) » 

The strategy is to 1) construct the gene of 
interest and clone it into the multiple cloning site; 2) 
transform S. coli rails with the recombinant plasmid; 3) 

10 induce the esqiiression of the cloned gene; 4) test for 

synthesis of the protein product; and, finally, 5) test for 
the formation of functional homodiners. In some cases, 
additional genes are also cloned into the same plasmid, when 
their function is required for dimerization of the protein of 

15 interest. For example, when wild-type or modified versions 
of gp37 are expressed, the bacterial chaperon gene 57 is also 
included; irtien wild-type or modified gp36 is expressed, the 
vild-^type version or a modified version of the gp37 gene is 
included. The modified gp37 should have the capacity to 

20 dimerize and contain an N-terminus that can chaperon the 
dimttrization of gp36. This method allows the formation of 
monomer ic gene products and, in some cases, maturation of 
monomers to homodimeric rods in the absence of other 
phage«*induced proteins normally present in a T4-infected 

25 cell. 

Steps 1-4 of the above-defined strategy are 
achieved by methods that are veil Xnown in the art of 
recombinant DNH tecdmology and inrotein expression in 
bacteria. For example, in step 1, restriction enzyme 

30 cleavage at multiple sites, followed by ligation of 

fragments, is used to construct deletions in the internal rod 
segment of gp34, 36, and 37 (see Example 1 below). 
Alternatively, a single or multiple restriction enzyme 
cleavage, followed by exonuclease digestion (EXO-SIZE, Hew 

35 England Biolabs, Beverly, MX) , is used to delete DNA 

sequences in one or both directions frcm the initial cleavage 
site; when combined with a subsequent ligation step, this 
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procadure produces a nested set of deletions of increasing 
sizes « Sittilarly, standard methods are used to recombine DNA 
segaents fron tvo different tail fiber genes, to produce 
chineric genes encoding fusion proteins (called "chimers** in 
S this description) • In general, this last method is used to 
provide alternate N- or C-^temini and thus create novel 
combinations of ends that enable new patterns of joining of 
different rod segments. A representative of this type of 
chimer, the fusion of gp37-*36, is described in Exaiq;»le 2. 

10 The preferred hosts for production of these proteins (Step 2) 
is JET. coll strain BI.21(DE3) and BL21(DE3/pI.ysS) (available 
commercially from Novagen, Kadison, WZ) , although other 
compatible recA strains, sucdi as I1MS174(DE3} and 
HMS174 (DE3/pLy8S) can be used« Transformation with the 

IS recombinant plasmid (Step 2) is accomplished by standard 

methods (Sambrook, J., MolBcular Cloning, Cold Spring Heurbor 
Laboratories, Cold Spring Harbor, NY; this is also the source 
for standard recxMSbinant DMA methods used in this invention.) 
Transformed bacteria are selected by virtue of their 

20 resistance to antibiotics e.g., ai4)icillin or kanamycin. The 
method by which expression of the cloned tail fiber genes is 
induced (Step 3) depends upon the particular promoter used. 
A preferred promoter is plac (with a laci^ on the vector to 
reduce background expression) , vhich can be regulated by the 

35 addition of isopropylthiogalactoside (IPTG). A second 
preferred prcmoter is pTT^lO, vhich is specific to T7 RNA 
polymerase and is not recognized by E. coli RNA polymerase. 
T7 RNA polymerase, lAiicsh is resistant to rifamycin, is 
encoded on the defective lambda DB lysogen in the B. coli 

30 BL21 chromosome* T7 polymerase in BL21(DS3) is 

super-repressed by the laci^ gene in the plasmid and is 
induced and regulated by IPT6. 

Typically, a culture of transformed bacteria is 
-incubated with the inducer for a period of hours, during 

35 vhich the synthesis of the protein of interest is monitored. 
In the present instance, extracts of the bacterial cells are 
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prepared, and the T4 tail fiber proteins are detected, for 
exaaple, toy SDS-polyacryla»ide gel electrophoresis. 

Once the modified protein is detected in bacterial 
extracts, it is necessary to ascertain whether or not it 
5 form appropriate homodimers (Step 4) . This is accomplished 
initially by testing whether the protein is recognized by an 
antiserum specific to the mature dimerized form of the 
protein* 

Tail f iber-»specif ic antisera are prepared as 

10 described (Edgar, and Lielausis, Z*, Genetics 52: X187, 

1965; Ward et al, jr. Mol. Biol. 54:15, 1970). Briefly, whole 
T4 phage are used as an immunogen; optionally, the resulting 
antiserum is then adsorbed with tail-less phage particles, 
thus removing all antibodies except those directed against 

15 the tail fiber proteins. In a subsecpient step, different 
aliquots of the antiserum are adsorbed individually with 
extracts that each lack a particular tail fiber protein. For 
exaaqple, if an extract containing only tail fiber components 
P34, gp35, and gp36 (derived from a cell infected with a 

20 mutant T4 lacking a functional gp37 gene) is used for 
absorption, the resulting antiserum will recognize only 
mature P37 and dimerized P36-P37. A similar approach may be 
used to prepare individual antisera that recognize only 
mature (i.e., homodimerized) P34 and P36 by adsorbing with 

25 extracts containing distal half tail fibers or P34, gp35 and 
P37, respectively. An alternative is to raise antibody 
against purified tail fiber halves, e*g., P34 and 
gp35-P3S-P37» Anti gp35-P36-P37 can then be adsorbed with 
P36«P37 to produM anti-gp35, and anti-P36 can be produced by 

30 adsorption with P37 and gp35. Anti«*P37, anti«gp35, and anti* 
P34 can also be produced directly by using purified P37, 
gp35, and P34 as immunogens. Another approach is to raise 
specific monoclonal antibodies against the different tail 
- fiber co]Q>onents or segments thereof. 

35 Specific antibodies to subunits or tail parts are 

used in any of the following ways to detect appropriately 
homodimerized tail fiber proteins: 1) Bacterial colonies are 
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screened for those expressing mature tail fiber proteins by 
directly transferring the colonies, or, alternatively, 
saaqples of lysed or unlysed cultures, to nitrocellulose 
filters, lysing the bacterial cells on the filter if 
5 necessary; and incubating with specific antibodies. 

Formation of immune c(»plexes is then detected by methods 
widely used in the art (e«g., secondary antibody conjugated 
to a chroaogenic enzyme or radiolabelled Staphylococcal 
Protein A. ) . This method is particularly useful to screen 

10 large numbers of colonies e.g., those produced by EXO-SIZE 
deletion as described above. 2) Bacterial cells expressing 
the protein of interest are first metabolically labelled with 
^S-methionine, followed by preparation of extracts and 
incubation with the antiserum* The immxme complexes are then 

15 recovered by incubation with immobilised Protein A followed 
by centrifugation, after which they may be resolved by 
SDS**polyacrylamide gel electrophoresis. 

An alternative cxaqpetitive assay for testing 
whether internally deleted tail fiber proteins that do not 

20 permit phage infection nonetheless retain the ability to 
dimerlze and associate with their appropriate partners 
utilizes an in vitro, complementation system. 1) A bacterial 
extract containing the modified protein of interest, as 
described above, is mixed with a second extract prepared from 

25 cells infected with a T4 phage that is mutant in the gene of 
interest. 2) After several hours of incubation, a third 
extract is added that contains the wild-type version of the 
protein being tested, and incobatim is ccmtinued for several 
additional hours. 3) Finally, the extract is titered for 

30 infectious phage particles by infecting S. coil and 

quantifying the phage plagues that result. A modified tail 
fiber protein that is correctly dimerized and able to join 
with its partners is incorporated into tail fibers in a 
non*functional manner in Step 1, thereby preventing the 

35 incorporation of the wild--type version of the protein in Step 
2; the result is a reduction in the titer of the resulting 
phage sample. By contrast, if the modified protein is unable 
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to aiaerlze and thus font proper N- and/or C-temlni, it will 
not be incorporated into phage particles in Step 1, and thus 
will not cospete with ass^Dbly of Intact phage particles in 
Step 2; the phage titer should thus be equivalent to that 
5 observed %^en no sodif ied protein is added in Step 1 (a 
negative control.) 

Another way in which to test whether chiners and 
internally deleted tail fiber proteins retain the ability to 
diaerise and associate with their appropriate partners is 

10 done in vivo. The assay detects the ability of such chiners 
and deleted proteins to conqpete with normal phage parts for 
assembly, thus reducing the burst size of a wild^type phage 
infecting the sane host cell in which the chiners or deleted 
proteins are recombinant ly expressed. Thus, expression from 

IS an expression vector encoding the chimer or deleted protein 
is induced inside a cell, which cell is then infected by a 
wild«-type phage. Inhibition of wild*-type phage production 
demonstrates the ability of the reconbinant chimer or protein 
to associate with the appropriate tail fiber proteins of the 

20 phage. 

The above-described methods are used, alone and in 
combination, in the design and production of different types 
of modified tail fiber proteins. For example, a preliminary 
screen of a large number of bacterial colonies for those 

as expressing a properly dimerised protein will identify 

positive colonies, which can then be individually tested by 
in vitro c:o^plementatipn. 

Mcm-^limiting examples of novel proteins that are 
encompassed by the present invention include: 

SO 1) Internally deleted gp34, 36, and 37 

polypeptides (See Example l below); 

2) A C-terminally truncated gp36 fused to the N- 
terminus of N-terminally truncated gp37; 

3) A fusion between gp36 and gp37 in which gp37 is 
3S N-terminal to gp36 (i.e., in reverse of the natural order), 

termed herein **gp37-36 chimer" (See Example 2 below) ; 
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4) A fusion bet%fsen gp34 and gp36 in imich gp36 is 
N-terainal to 9p34 in reverse of the natural order), 
termed herein **gp36'34 chieer"; 

5) A variant of gp36 in which the C-terminus is 
S Mutated such that it lacks the capability to interact with 

(and diserize in response to) the N-tersinus of wild-type 
P37, tereed herein "gp36*"; 

6) A variant of gp37 In idiich the N-teminus is 
mitated such that it foms a P37 that lacks the capability to 

10 interact with the C->terminus of wild-type 9p36, termed herein 
"*P37"; 

7) Variants of gp36* and *P37 that can interact 
with each other, but not with gp36 or P37. 

8) A variant "PST-se chimer" in which the gp36 

IS noiety is derived from the variant as in 5), i.e., "PSV-ae*". 
(For 5-8, See Example 3 below.) 

9) A variant ••P37-36 chiaer** in which the gp37 
■oiety is derived fron the variant as in 6} above, i.e., 
"♦P37-36". 

20 10) A variant P37-36 chieer, *P37-P36*, in which 

the gp36 and gp37 moieties are derived from the variants in 
7). 

11) A fusion between 9p36 and 9p34 in which gp36 
sequences are placed N-terminal to 9p34, the dimr of which 

2S is termed herein •*P36-34 tdiimer**; 

12) Variants of gp35 that form average angles 
different from 137 • or 158* (the native angle) e.g., less 
than about 125* or more than about 145* under conditions 
idierein the wild^type gp35 protein forms an angle of 137* 

30 when combined with the P34 and P36-P37 dimers, and/or exhibit 
more or less flexibility than the native polypeptide; 

13) Variants of gp34, 35, 36 and 37 that exhibit 
thermolabile interactions or other variant specific 
interactions with their cognate partners; and 

35 14) Viuriants of gp37 in which the c-terainal 

domain of the polypeptide is modified to include sequences 
that confer specific binding properties on the entire 
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nol0cule, e.g., sequences derived from avidln that recognize 
biotin, sequences derived froai inmunoglobulin heavy chain 
that recognize Staphylococcal A protein, sequences derived 
from the Fab portion of the heavy chain of monoclonal 
5 antibodies to which their respective Fab light chain 

counterparts could attach and form an antigen-binding site, 
immunoactive sequences that recognize specific antibodies, or 
sequences that bind specific metal ions. These ligands may 
be immobilized to facilitate purification and/or assembly. 

10 In specific embodiments, the chimers of the 

invention comprise a portion consisting of at least the first 
10 (N-terminal) amino acids of a first tail fiber protein 
fused via a peptide bond to a portion consisting of at least 
the last 10 (C-terminal) amino acids of a second tail fiber 

X5 protein. The first and second tail fiber proteins can be the 
same or different proteins. In another embodiment, the 
chimers comprise an amino acid portion in the range of the 
first 10*-60 amino acids from a tail fiber protein fused to an 
amino acid portion in the range of the last 10-60 amino acids 

ao from a second tail fiber protein. In another embodiment, 
eadi amino acid portion is at least 20 amino acids of the 
tail fiber protein. The chimers comprise portions, i.e., not 
full-length tail fiber proteins, fused to one another. In a 
preferred aspect, the first tail fiber protein portion of the 

25 cfaimer is frcm gp37, and the second tail fiber protein 

portion is from gp36. Such a chimer (gp37-36 chimer) , after 
oligamerisation to form P37-»36, can polymerize to other 
idmtical oligomers. A 9p3€-34 chimer, after oligomerization 
to form P36-34, can bind to gp35, and this unit can then 

30 polymerize. In another embodiment, the first portion is from 
gp37, and the second portion is from gp34. In a preferred 
aspect, the chimers of the invention are made by insertions 
or deletions within a 0 turn of the structure of the tail 
fiber proteins. Most preferably, insertions into a tail 

35 fiber sequence, or fusing to another tail fiber protein 

sequence, (preferably via manipulation at the recombinant DNA 
level to produce the desired encoded protein) is done so that 
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sequences in turns on the sane edge of the jS-sheet are 
joined. 

In addition to the above-described chimers, 
nanostructxires of the invention can also comprise tail fiber 
S protein deletion constructs that are truncated at one end, 
B^g., are lacking ah aaino* or carboxy-^ end (of at least 5 or 
10 amino acids) of the nolecule. Such solecules truncated at 
the amino-teminus, e.g«, of truncated gp37, gp34, or gp36, 
can be used to ""cap** a nanostructure, since, once 
10 incorporated, they will teminata polymerization, such 
molecules preferably comprise a fragment of a tail fiber 
protein lacking at least the first 10, 20, or 60 amino 
terminal amino acids. 

In order to change the length of the rod component 
15 proteins as desired, portions of the same or different tail 
fiber proteins can be inserted into a tail fiber chimer to 
lengthen the rod, or be deleted from a chimer, to shorten the 
rod. 

20 MsmniTiY OF iMDimoM, moD coMPcanHTff iffTft IfMffWTFffnrWIff 

Expression of the proteins of the present invention 
in J?, mli as describ<»d above results in the synthesis of 
large quantities of protein, and allovs the simultaneous 
BxpresBion and assembly of dif fcorent ccxaponents in the same 

as cells. The methods for scale^up of recombinant protein 
production are straightforward and widely known in the art, 
and many standard protocols can be used to recover native and 
modified tail tihmr proteins from a bacterial culture. 

In a preferred embodiment, native (nonrecc»binant) 

30 gp35 is isolated for use by growing up a bacteriophage T4 
having an amber mputation in gene 36, in a su* bacterial 
strain (not an amber suppressor) , and isolating gp3S from the 
resulting culture by standard methods. 

P34, P36-P37, P37, and chimers derived from them 

35 are purified from €x>li cultures as mature dimers. Gp35 and 
variants thereof are purified as monomers. Purification is 
achieved by the following procedures or combinations thereof, 
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using standard Mthods: 1) chroamtography on molecular 
sieve, ion*-exchan9e, and/or hydrophobic matrices; 
2) preparative ultracentrifugation; and 3) affinity 
chromatography, using as the immobilized ligand specific 
9 antibodies or other specific binding moieties » For example, 
the C-terminal domain of P37 binds to the lipopolysaccharide 
of coii B. Other T4-like phages have P37 analogues that 
bind other cell surface cc»qfK>nents such as OmpF or TSX 
protein. Alternatively, if the proteins have been engineered 

10 to include heterologous domains that act as ligands or 

binding sites, the cognate partner is immobilized on a solid 
matrix and used in affinity purification* For example, such 
a heterologous domain can be biotin, which binds to a 
streptavidin-coated solid phase* 

15 Alternatively, several components are co-expressed 

in the same bacterial cells, and sub^assemblies of larger 
nanostructures are purified subsequent to limited in vivo 
assembly, using the methods enumerated above. 

The purified components are then combined in vitro 

20 under conditions where assembly of the desired nanostructure 
occurs at temperatures between about 4*C and about 37 ^C, and 
at pHs between about 5 and about 9. For a given 
nanostiructure, optimal conditions for assembly (i.e., type 
and concentration of salts and metal ions) are easily 

as determined by routine experimentation, such as by changing 
each variable individually and monitoring formation of the 
appropriate products. 

Alternatively, crnie or more oude bacterial extracts 
may be prepared, mixed, and assembly rcMictif»is allowed to 

SO proceed prior to purification. 

In some cases, one or more pxirified components 
assemble spontaneously into the desired structure, without 
the necessity for initiators. In other cases, an initiator 
is required to nucleate the polymerization of rods or sheets. 

3S This offers the advantage of localizing the assembly process 
(i.e., if the initiator is immobilized or otherwise 
localized) and of regulating the dimensions of the final 
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Proteins which catalyze the f ornation of correct 
(lowest energy) stable secondary (2'') structure of proteins 
are called chaperone proteins « (Oft:en, especially in 
globular proteins, this stabilization is aided by tertiary 
S structure, e»g«, stabilization of iS-^sbeets by their 

interaction in jS-^barrels or by interaction with a-helices) * 
Normally cbaperonins prevent intrachain or interchain 
interactions which would produce untoward netastable folding 
intermediates and prevent or delay proper folding. There are 

XO two known accessory proteins, gpS7 and gp38, in the 

mori^iogenesis of T4 phage tail fibers which are sometimes 
called chaperonins because they are essential for proper 
maturation of the protein oligomers but are not present in 
the final structures. 

15 The usual chaperonin system (e.g., groEL/ES) 

interact with certain oligopeptide moieties of the gene 
product to prevent unwanted interactions with oligopeptide 
moieties elsewhere on the same polypeptide or another 
peptide. ThiMse would form metastable folding intermediates 

20 iriiich retard or prevent proper folding of the polypeptide to 
its native (lower energy) state. 

Gp57, probably in conjunction with some membrane 
protein (s), has the role of juxtaposing (and aligning) and/or 
initiating the folding of 2 or 3 identical gp37 molecules. 

25 The aligned peptides then zip up (while mutually stabilizing 
their nascent ^-structures) to form a beam, without further 
interaction with gpS7« 6p57 acts in T4 assembly not only for 
oligomerization of gp37 but also for gp34 and gpl2. 

30 amcmm ampammn fos nm amsiBMmT.Y of bkmib in vim 

Alternatively to starting the polymerization of 
chimera with the use of a prefoned chimeric or natural 
oligomeric unit called an initiator produced in vivo, 
molecules (preferably peptides) that can self-assemble can be 
35 produced as fusion proteins, fused to the or C-terminus of 
tail fiber variants of the invention (chimers, 
deletion/insertion constructs) to align their ends and thus 
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to facilitate their subsequent \maided folding into 
oligomeric, stable /3-folded rod-like (beam) units in vitro. 
In the absence of the nomally required chaperonin proteins 
(e*g., gp57) and host cell senbrane proteins. 
S As an Illustration, consider the P37 unit as an 

initiator of gp37-36 oligoaerization and polyserization* 
Monuilly, proper folding of gp37 to a P37 initiator requires 
a phage infected cell menbrane, and two chaperone proteins, 
gp38 and gp57. In a preferred enbodivent, the need for gp38 

10 can be obviated by use of a vutation, ts3813 (a duplication 
of 7 residues just downstream of .the transition zone of gp37} 
vhich suppresses gene 38 (Wood, F.A« Eiserling and R.A. 

Crovther, 1994, "Long Tail Fibers: Genes ^ Proteins, 
Structure, and Assembly," in Molecular Biolocfy of 

IS Bacteriophage T4 . (Jim D. Karam, Editor) American Society for 
Microbiology, Washington, D.C«, pp 282-290). If a moiety 
that self-assembles into a dimer or trimer or other oligomer 
("self -assembling moiety") is fused to a C-terminal deletion 
of gp37 downstream or upstream of the transition region [the 

20 transition region is a conserved 17 amino acid residue region 
in T4-like tail fiber proteins irtiere the structure of the 
protein narrows to a thin fiber; see Henning et al., 1994, 
"Receptor recognition by T«^ven-type coliphages," in 
Wol^gHlar Biology of BactgrigPhag? Tii Karam (ed.)r American 

25 Society for Microbiology, Washington, D*C., pp. 291-298; Wood 
et al., 1994, "Long tail fibers: Genes, proteins, structure, 
and assembly," in M^i^c^i^ B^olocjry f>f Bacteriophage T4. 
Karam (ed.), American Society for Microbiology, Washington, 

pp. 282-290), When it is expressed, the self-assembling 

SO moiety will oligomerise in parallel and thus align the fused 
gp37 peptides, permitting them to fold in vitro, in the 
absence of other chaperonin proteins. 

If P37 is a dimer (Figure 8A) , the self-assembling 
moiety can be a self dimerizing peptide such as the leucine 

35 zipper, made from residues 250-281 from the yeast 

transcription factor, GCH4 (E.K. 0*shea, R.Rutkovski and P.S. 
Kim, Science 243:538, 1989) or the self dimerizing mutant 
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leucine zipper peptide, pIL in which the a positions ere 
eubetitttted i^ith isoleucine end the d positions with leucine 
(Harbury T. Zhang, P»S. Kim and T* Alper. 1993« A 

Switch Between Two-, Three*, and Four-Stranded Coiled Coils 
S in GCN4 Leucine Zipper Mutants. Science, 262:1401*1407). If 
P37 is a trieer (Figure SB) , the Mlf-asseaibling noiety can 
be a self tri»erizing mutant leucine zipper peptide, pll in 
which both the a and d positions are substituted with 
isoleucine (Harbury P.B», et al. iJbid) • Alternatively, a 

10 collagen peptide can be used as the self-assesbling aoiety, 
such as that described by Bella et al. (J. Bella, M. Eaton, 
B. Brodsky and H.M. Berman. 1994. Crystal and Molecular 
Structure of a Collagen-Like Peptide at 1.9A Resolution* 
Science, 226:75-81), which self aligns by an inserted 

IS specific non repeating alanine residue near the center. 

Self-asseabling moieties can be used to make 
initiators for polymerizations in the absence of the normal 
initiators* For example, to create an initiator for 
oligomerization and polymerization of the chimeric monomer, 

20 gp37-36, gp37-36-C2 ^ used as illustrated in Figure 9. 
(C2 means that a dimer forming peptide is fused to the 
C-terminus of the gp36 moiety. This is used if the beam is a 
dimer ic structure. Otherwise C3 — a trimer forming peptide 
fused to the C-terminus would be used.) Furthermore, use 

25 of the coll lac repressor N-terminus, e.g., which 
associates as a tetramer, with two coils facing in each 
direction ccmld join two dimers (or polymers of dim^s) end 
to end, either at their er C->termini depending upon which 
end the 8elf-*as8«abling peptides were placed, niey could 

30 elso join M- to C* termini, in any case, alone, they could 
only form a dimer, each end of which would be extensible by 
adding an appropriate dhimer monomer (as shown for the 
simpler case in Figure 9) . 

In an alternative embodiment, the self -assembling 

35 moiety can be fused to the N-termini of the chimer. In a 
specific embodiment, the self -assembling moiety is fused to 
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at least a 10 amino acid portion of a T-even-like tail fiber 
protein. 

A self asseabling noiety that assenbles into a 
heteroligoner can also be used. For example, if 
S polymerization between beans is directed by the surface of a 
dineric cross-jS surface, addition of a baterodimeric unit 
with one surface whi^ does not proBK»t« further 
polymerisation would be very useful to cap the penultimate 
unit and thus terminate polymerization. If the two types of 

10 coiled regions of the self-assembling moiety are much more 
attractive to each other that to themselves, then all of the 
dimers will l>e heterodimers. Such is the case for the 
N-terminal Jun and Fos leucine zipper regions. 

A further advantage to such heterodimeric units is 

IS the ability to stage polymerization and thus build one unit 
(or one surface in a 2D array) at a time. For example, 
suppose surface A attaches to B but neither attaches to 
itself ([A<->B] is used to symbolize this type of 
interaction) . Mix A/A and B/B, (B. is attached to a matrix 

20 for easy purification) . This will form B,/B-A/A. Mow wash 
out A/A and add B/B. The construct is now Bo/B-A/A-B/B. Now 
add A/A,. The construct is now B,/B-A/A-B/B-A/Ao and no more 
beams can be added. Ukere are of cx>urse many other 
possibilities. 

25 

APPMCATIOiW 

The uses of the nanostructures of the present 
invention are manifold and include aiqplications that require 
highly regular, well-defined arrays of fibers, cages, or 

30 solids, whitih may include specific attachment sites that 
allow them to associate with other materials. 

In one embodiment, a three-dimensional hexagonal 
array of tubes is used as a molecular sieve or filter, 
providing regular vertical pores of precise diameter for 

35 selective separation of particles by size. Such filters can 
be used for sterilization of solutions (i.e., to remove 
microorganisms or viruses), or as a series of 
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Aolecular-velght cut-off filters. In this case, the protein 
coaponents of the pores eay be modified so as to provide 
specific surf acre inroperties (i.e., hydrophilicity or 
hydrophobicity, ability to bind specific ligands, etc.)« 
5 Among the advantages of this type of filtration device is the 
uniformity and linearity of pores and the high pore to matrix 
ratio. 

In another embodiment, long one^imensional fibers 
are incorporated, for exaiople, into paper or cement or 

10 plastic during manufacture to provide added vet and dry 
tensile strength. 

In still another embodiment, different 
nanostructure arrays are impregnated into paper and fabric as 
anti* counterfeiting markers. In this case, a simple 

IS color-*linked antibody reaction (such as those commercially 
available in kits) is used to verify the origin of the 
material* Alternatively, such nanostructure arrays could 
bind dyes or other substances, either before or after 
incorporation to color the paper or fabrics or modify their 

20 appearance or properties in other ways. 

MM 

The invention also provides kits for making 
nanostructures, comprising in one or more containers the 
25 chimers and deletion constructs of the invention* For 
example, one such kit comprises in one or more containers 
purified gp35 and purified gp36**34 chlmer. Another such kit 
comprises purified gp37-36 dhimer. 

The folloving examples are int^ed to illustrate 
30 the present invention without limiting its scope. 

In the examples below, all restriction enzymes, 
nucleases, ligases, etc. are commercially available from 
numerous commercial sources, such as Mew England Biolabs 
(NEB) , Beverly, HA; Idfe Technologies (GIBCO-BRIi) , 
35 Gaithersburg, ND; and Boehringer Mannheim Corp. (BMC) 
Indianapolis, IM. 
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The gene encoding gp37 contains two sites for the 
restriction enzyme Bgl II, the first cleavage occurring after 
S nucleotide 293 and the second after nucleotide 1486 (the 
nucleotides are numbered from the initiator methionine codon 
AT60 Thus, digestion of a DNA fragment encoding gp37 with 
Bglll, excision of the intervening fragment (nucleotides 
294- 1485) and re-ligatlon of the 5' and 3' fragments results 
10 in the formation of an internally deleted gp37, designated 
AP37, in which arginine-98 is joined with serine-497. 



The restriction digestion reaction mix contains: 
gp37 plasmid DNA (1 ^g/Ml) 2^1 
15 NEB buffer #2 (lOX) iMl 

H2O 6Ml 
Bgl II (10 U/Ml) iMl 



The gp37 plasmid signifies a pT7--5 plasmid into which gene 37 
20 has been inserted in the multiple cloning site, downstream of 
a good ribosome binding site and of gene 57 to chaperon the 
dimerisation. The reaction is Incubated for Ih at 37*C. 
Then, 89 $Ll of T4 DNA ligase buffer and 1 fil of T4 DNA ligase 
are added, and the reaction is continued at le^'C for 4 hours « 
25 2 Ml of the Stu I restriction enzyme are then added, and 
incubation continued at 37 *C for Ih. (The stu I restriction 
enzyme digests residual plasmids that were not cut by Bgl II 
in the first step, reducing their transformabilit^ by about 
lOO-fold.) 

30 The reaction mixture is then transformed into £• 

coil strain BIi21, obtained from Novagen, using standard 
procedures* The transformation mixture is plated onto 
nutrient agar containing 100 iig/nl ampicillin, and the plates 
are incubated overnight at 37*C« 

35 Colonies that appear after overnight incubation are 

picked, and plasmid DNA is extracted and digested with Bgl II 
as above. The restriction digests are resolved on 1% agarose 
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gels* A successful deletion is evidenced by the appearance 
after gel electrophoresis of a new DNA fragment of 4*2 kbp, 
representing the undeleted part of gene 37 which is still 
attached to the plasmid and which re-fortted a Bglll site by 
5 ligation. The l«2 kbp DNA fragment bounded by Bglll sites in 
the original gene is no longer in the plasmid and so is 
missing from the gel. 

Flasmids selected for the predicted deletion as 
above are transformed into call strain BL21(OE3). 

10 Transformants are grown at 30*C until the density (A^n) of the 
culture reaches 0.6. IPTG is then added to a final 
concentration of 0.4 mM and incubation is continued at 30«C 
for 2h, after which the cultures are chilled on ice. 20 |il 
of the culture is then removed and added to 20 ^1 of a 

15 two-^fold concentrated *"cracking buffer" containing 1% sodium 
dodecyl sulfate, glycerol, and tracking dye. 15 Ml of this 
solution are loaded onto a 10% polyacrylamide gel; a second 
aliquot of 15 m1 is first incubated in a boiling water bath 
for 3 min and then loaded on the same gel. After 

20 electrc^oresis, the gel is fixed and stained. Expression of 
the deleted gp37 is evidenced by the appearance of a protein 
species migrating at an apparent molecular mass of 65-70,000 
daltons in the boiled sample. The extent of dimerization is 
suggested by the intensity of higher-molecular mass species 

25 in the unboiled sample and/or by the disappearance of the 
65<*70,000 dalton protein band. 

The ability of the deleted polypeptide to dimerize 
ap|»r«)priately is directly evaluated by testing its ability to 
be recognized by an anti-P37 antiserum that reacts only with 

30 mature P37 dimers, using a standard protein immunoblotting 
procedure. 

An alternative assay for functional dimerization of 
the deleted P37 polypeptide (also referred to as AP37) is its 
ability to cMplement in vivo a T4 37~ phage, by first 

" inducing esqpression of the AP37 and then infecting with the 
T4 mutant, and detecting progeny phage. 
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A AP37 was prepared as described above, and found 
capable of coBq;>lementing a T4 37" phage in vivo. 



The starting plasmld for this conetructlon is one 
in vhlch the gene encoding gp37 is cloned imaediately 
upstreaa (i^e., 5'} of the gene encoding 9p36. The plasnid 
is digested with Hae III, vhlch deletes the entire 3' region 
of gp37 DMA downstream of nucleotide 724 to the 3' terminus, 
and also removes the 5' end of gp36 DMA from the 5' terminus 
to nucleotide 349. The reaction mixture is identical to that 
described in Example 1, except that a different plasmld DMA 
is used, and the enzyme is HaeZII. Ligation using T4 DMA 
ligase, bacterial transformation, and restriction analysis 
are also performed as in Example 1. In this case, excision 
of the central portion of the gene 37-36 insert and 
rellgatlon reveals a novel Insert of 346 in-frame codons, 
which is cut only once by Haelll (after nucleotide 725) . The 
*® resulting construct is then expressed in B. coli BL21(DB3) as 
described in Example 1* 

Successful expression of the gp37-36 chimer is 
evidenced by the appearance of a protein product of about 
35,000 daltons. This protein will have the first 242 
M-terminal amino acids of gp37 fused to the final 104 
C-termlnal amino acids of gp36 (numbered 118-221*) The 
utility of this chimer depends upon its ability to dlmerlze 
and attach end-to-end. That is, carboxy termini of said 
polypeptide will have the capability of interacting with the 
amino terminus of the P37 protein dimer of bacteriophage T4 
and to form an attached dimer, and the amino terminus of the 
dimer of said polypeptide will have the capability of 
interacting with other said chimer polypeptides. This 
property can be tested by assaying whether introduction of 
AP37 initiates dimerisation and polymerization. 
Alternatively, polyclonal antibodies specific to P36 dimer 
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Bay be used to detect P36 subeeguent to initiation of 
diaerization by 

A gp37-36 chiaer was prepared similarly to the 
procedures described above, except that the restriction 
5 enzy»e TaqI was used instead of Haelll. Briefly, the 5' 
fragment resulting from TaqI digestion of gene 37 was ligated 
to the 3' fragment resulting from TaqI digestion of gene 36* 
This produced a construct encoding a gp37-36 chimer in which 
amino acids 1*48 of 9p37 were fused to amino acids 100-221 of 
10 gp36. This construct was expressed in coll BL21(DE3), and 
the cdiimer ims detected as an 18 kD protein. This gp37**36 
chimer vas found to inhibit the growth of wild type T4 when 
eaq;>re8sion of the gp37-36 chimer was induced prior to 
infection (in an in vitro phage inhibition assay) . 

IS 

fiE&MELB-l 
XUTMXOM OV TBB 0P37»36 CEXMBR 

90 wtopncm comfc utmnx^mv sppprkssors 

The goal of this construction is to produce two 

20 variants of a dimerizable P37-36 chimer: One in which the M- 
terminus of the polypeptide is mutated (A, designated 
*P37-36} and one in which the C^terminus of the polypeptide 
is mutated (B, designated P37-36*) . The requirement is that 
the mutated *P37 N-*terminus cannot form a joint with the 

25 wild-type P36 C- terminus, but only with the mutated *P36 
H-terminus. The rationale is that A and B each cannot 
polymerise independently (as the parent P37-36 protein can) , 
but can only associate with each other sequentially (i.e., 
P37-36* + ♦P37-36 ~> P37-36*— *P37-36) . 

90 A second construct, *p37-P36*, is formed by 

recombining *P37-36 and P37-»36* in vitro. When the monomers 
*gp37-36* and gp37-36 are mixed in the presence of P37 
initiator, gp37-36 would dimerize and polymerize to 
(P37-36)n; similarly, *P37 would only catalyze the 

35 polymerization of *gp37-36* to (*P37-36*)n. In this case, 
the two cdiimers could be of different size and different 
primary sequence with different potential side-group 
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interactions, and could initiate attaehnent at different 
surfaces depending on the attaCb»ent specificity of P37. 

The starting bacterial strain is a su' strain of S. 
coll (\rtiich lacks the ability to suppress asber mutations) . 
5 mien this strain is infected with a mutant T4 bacteriophage 
containing amber mutations in genes 35, 36, and 37, phage 
replication is incomplete, since the tail fiber proteins 
cannot be synthesized. When this strain is first transformed 
with a plasmid that directs the exi^ession of the wild type 

xo gp35, gp36and gp37 genes and induced with IPT6, and 

subsequently infected with mutant phage, infectious phage 
particles are produced; this is evidenced by the appearance 
of "nibbled'* colonies. NiU»led colonies do not appear round, 
with smooth edges, but rather have sectors aissing. This is 

IS caused by attack of a microcolony by a single phage, which 
replicates and prevents the growth of the bacteria in the 
missing sector. 

For the purposes of this construction, the 
3 '-terminal region of gene 36 ( corresponding to the 

29 c-terminal region of gp36) is mntagenizsd with randomly doped 
oligonucleotides. Randomly AopmA oligonucleotides are 
prepared during cdiemical synthesis of oligonucleotides, by 
adding a trace amount (up to a f«w pcrewnt) of the other 
three nucleotides at a givwft position, so that the resulting 

25 oligonucleotide mix has a small percentage of incorrect 
nucleotides at that position. Incorporation of such 
oligonucleotides into the plasmid will result in randcm 
mutations (Hutchison et al., Hethods.Snzymol. 202:356, 1991). 

The mutagenised pcqpalation of plasmids (cmtaining, 

30 however, unmodified genes 36 and 37) , is then transformed 
into the su* bacteria, followed by infection with the mutant 
T4 phage as above. In this case, the appearance of 
non-*'nibbled'' colonies indicates that the mutated gp36 

- C-termini can no longer interact with wild type P37 to form 
35 functional tail fibers. The putative gp36* phenotypes found 
in such non-nil^led colonies are checked for lack of dimeric 
N->termini by appropriate immunospecificity as outlined above. 
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and positive colonies are used as source of plasmld for the 
next step. 

Several of these mutated plasmids are recovered and 
subjected to a second round of nutagenesis, this tivte using 
5 doped oligonucleotides that introduce random mutations into 
the N-terminal region of gp37 present on the same plasmid. 
Again, the (nov doubly) mutagenized plasmids are transformed 
into the supo strain of coli and transformants are 
infected with the mutant T4 phage. At this stage, bacterial 

XO plates are screened for the re^appearance of "nibbled** 

colonies. A nibbled colony at this stage indicates that the 
lAiage has replicated by virtue of suppression of the 
non-functional gp36* mutation (s) by the *P37 mutation. In 
other words, such colonies must contain novel *P37 

15 polypeptides that have now acquired the ability to interact 
with the P36* proteins encoded on the same plasmid. 

The *P37-*36 and P37-36* paired suppressor chimers 
(A and B as above) are then constructed in the same manner as 
described in Example 2. In this case, however, *P37 is used 

SQ in place of wild type P37 and P36* is used in place of wild 
type P36. A *P37-36* chimer can now be made by restriction 
of *P37-36 and P37~36* and religation in the recombined 
order. The *P37-36* can be mixed with the P37-36 chimer, and 
the polymerization of each can be accomplished independently 

25 in the presence of the other. This is useful when the 

rod-lilce central portion of these chimers have been modified 
in different ways. 

30 PMiflM. cM fftWITCTHIff romsicM ot a ciii^s>34 cimnni 
The starting plasmid for this construction is one 
in which the vector containing gene 57 and the gene encoding 
gp36 is cloned immediately upstream (i<»e., 5*) of the gene 
-encoding gp34. The plasmid is digested with Ndel, which cuts 
35 after bp 219 of gene 36 and after bp 2594 of gene 34, thereby 
deleting the final 148 C--terminal codons from the pg36 moiety 
and the first 865 N-terminal codons from the gp34 moiety. 
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The raaction mixture is identical to that described in 
Exaaple l, except that a different plasmid DNA is used, and 
the enzyme used is Ndel (NEB) . Ligation using T4 DNA ligase, 
bacterial transformation, and restriction analysis are also 
9 performed as in Example 1. This results in a new hylwrid gene 
encoding a protein of 497 amino acids (73 N-terminal amino 
acids of gp36 and 424 c-terminal amino acids of gp34, 
numterad 866-X289.) 

AS an alternative, the starting plasmid is cut with 

10 Sphl at bp 648 in gene 34, and the Exo-Size Deletion Kit 
(NEB) is used to create deletions as described above. 

The resulting construct is then expressed in 
E. coll BL21(OB3) as described in Example 1. Successful 
expression of the gp36-34 chimer is evidenced by the 

IS appearance of a protein product of about 55,000 daltons. 
Preferably, the amino termini of the polypeptide homodimer 
have the capability of intwraeting with the gp35 protein, and 
then the carboxy termini have the capability of interacting 
with other attached gp35 molecules, successful formation of 

20 the dimer can be detected reaction with anti-P36 

antibodies or by attachment of gp35 or by the in vitro phage 
inhibition assay described in Example 2. 



25 TaniATTOM o r ™iiMOiABiT.K raOTBIMg TOR gW-MBBMBItY 

Tbermolabile structures can be utilized in 
nanostructures for: a) initiation of chimer polymerization 
(e.g., gp37-36) at low teaperature and subsequent 
inaetlvation of and separation from the initiator at high 

30 teiqierature; b) initiation of angle formation between P36 i 
gp35 (e.g., variants of gp35 that have thermolabile 
attachment sites for P36 N-termini or P34 C-termini, a 
variant P36 that forms a thermolabile attachment to gp35, 
a variant P34 with a thermolabile C-terminal attachment 

35 site.) Thermolability may be reversible, permitting 
reattachment of the appropriate termini when the lower 
temperature is restored, or it may be irreversible. 
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concentrated suspension of phage, and reincubation at 30«C is 
perfoned either before or after dilution. If phage are 
successfully reactivated before, but not after, dilution, 
this indicates that their gp35 is reversibly thermolabile. 
S To create a gene 36 mutation with a themolabile 

gp35-'-P36 linkage, the C«*teraiinus of gene 36 is mutagenized 
as described above, and the vutant selected for 
reversibility. An alternative is to mutagenize gp35 to 
create a gene 35 autant in which the gp35-P36 linkage will 
10 dissociate at 60*C. In this case, incubation with anti*gp35 
antibodies can be used to precipitate the phage without 
P36-*P37 and thus to separate them from the wild-type phage 
and distal half -tail fibers (P36-P37), since the variant gp35 
will remain attacshed to P34. 

15 

mHFW < 
MffBHBW or QMB-PimygMHMi R9Pg 
A* Simple Assembly: The P37-36 chimer described in 
Example 2 is capable of self-assenAly, but requires a P37 

20 initiator to bind the first unit of the rod« Therefore, a 
P37 or a ^37 dimer is either attached to a solid matrix or 
is free in solution to serve as an initiator* If the 
initiator is, attached to a solid matrix, a thermolabile P37 
dimer is preferably used. Addition of an extract containing 

25 gp37-36, or the purified gp37-36 chimer, results in the 
assembly of linear multimers of increasing length. In the 
matrix-bound case, the final rods are released by a brief 
incubation at high temperature (40-*60*C, depending on the 
cdiaracteriBtics of the particsular thermolabile P37 variant.) 

30 The ratio of initiator to gp37-36 can be varied, 

and the size distribution of the rods is measured by any of 
the following methods: 1) Size exclusion chromatography; 
2) Increase in the viscosity of the solution; and 3) Direct 
measurement by electron microscopy. 

35 B. Staged assembly: The P37-36 variants *P37-36 

and P37-36* described in Example 3 cannot self -polymerize. 
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This allows the staged assembly of rods of defined length, 
according to the following protocol: 

!• Attach initiator P37 (preferably 
thermolabile) to a laatrix. 
S 2. Add excess *gp37-36 to attach and oligonerize 

as P37*36 hottooligosiers to the N-teminus of P37. 

3« Wash out unreacted *gp37-36 and flood with 

gp37-36** 

4. Wash out unreacted gp37-36* and flood with 
10 excess *gp37«36. 

5. Repeat steps 2-4, n-1 times* 

6. Release assembly from matrix by brief 
incubation at high temperature as above* 

The linear dimensions of the protein rods in the 
15 batch will depend upon the lengths of the unit heterochimers 
and the numJoer of cycles (n) of addition. This method has 
the advantage of insuring absolute reproducibility of rod 
length and a homogenous, monodisperse size distribution from 
one preparation to another • 

20 

EXAH PIil 7 
mflRD ASS mi>t.y QP POLYflOllS 

The following assembly strategy utilizes gp35 as an 
angle joint to allow the formation of polygons. For the 

25 purpose of this example, the angle formed by gp35 is assumed 
to be 137 The rod unit coiqprises the P36-34 chimer 
described in Example 4, which is incapable of 
self-polymerization. The P36-34 homodimer is made fr<»B a 
bacterial clone in which both gp36-34 and gp57 are expressed. 

30 VkkB gp57 can chaperone the homodimerization of gp36-34 to 
P36-34. 

1. Initiator: The incomplete distal half fiber 
P36-37 is attached to a solid matrix by the P37 C-teminus. 
Thermolabile gp35 as described in Example 5 is then added to 
35 form the intact initiator. 
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2. Excess P36-34 Chillier is added to attach a 
single P36-34* Following binding to the natrix via gp35, the 
unbound chiner is washed out. 

3* Wild-type (i.e., non-thernolabile) gp35 is then 
5 added in excess* After inculcation, the unbound material is 
washed out. 

4. Steps 2 and 3 are repeated 7*8 times. 

5. The assembly is released from the matrix by 
brief incubation at high temperature. 

10 The released polymeric rod, 8 units long, will 

form a regular 8-»sided polygon, irtiose sides comprise the 
P36-34 dimer and whose joints comprise the wild-type gp35 
monomer. However, there will be some multimers of these 8 
units bound as helices. When a unit does not close, but 

15 instead adds another to its terminus, the unit cannot close 
further and the helix can build in either direction. The 
direction of the first overlap also determines the handedness 
of the helix. Ten (or seven) -unit rods may form helices more 
frequently than polygons since their natural angles are 144* 

20 (or 128.6*). The likelihood of closure of a regular polygon 
depends not only on the average angle of gp35 but also on its 
flexibility, which can be further manipulated by genetic or 
environmental modification. 

The type of polygon that is formed using this 

25 protocol depends upon the length of rod units and the angle 
formed by the angle joint. For example, alternating rod 
units of different sises can be used in step 2. In addition, 
variant gp35 polypefytides that form angles different than the 
natural angle of 137* can be used, allowing the formation of 

30 different regular polygons. Furthermore, for a given polygon 
with an even number of sides and equal angles, the sides in 
either half can be of any size provided the two halves are 
symmetric. 
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GCMGTG6TQ ATAAAMMT CMOGTAGCT TTAGCT6ATC OTACCGTAGG AACTGAGOGT 120 

GTTAAOGtTQ ATTACTTAAT TCMGAJU^C ACAGTTCAAC AGTATGATCC AACTCGTGOA 180 

TATTTAAAAG ATTTTGTAAT CATTTATGAT AACC6CTTTT GGGCTGCtAT AAATGATATT 240 

CCAAAACCAG CAG6AGCTTT TAATA6CGGA CGCTG6AGA6 CATTACGTAC CGATGCTAAC 300 

TG6ATTA06G TTTCATCTGG TTCATATCAA TTAAAATCTO GTGAAGCMT TTOGGTTAAC 360 

ACOGCAGCTG GAAATGACAT CAOGTTTACT TTACCATCTT CTOCAATTGA T6GTGATACT 420 

ATCGTTCTCC AA6ATATTGG AG6AAAACCT GGAGTTAACC AAGTTTTAAT TGTAGCTCCA 480 

6TACAAAGTA TTGTAAACTT TAGA6GTGAA CAGGTAOGTT CA6TACTAAT GACTCATCCA 540 

AA6TCACA6C TAGTTTTAAT TTTTAGTAAT CGTCT6TGGC AAATGTATGT T6CTGATTAT 600 

AGTA6AGAAG GTATAGTTGT AACACCAGOG AATACTTATC AAGOGCAATC CAACGATTTT 660 

ATCX3TA0GTA GATTTACTTC T6CTGCACCA ATTAATGTCA AACTTCCAAG ATTTGCTAAT 720 

CATC60GATA TTATTAATTT 06TC6ATTTA GATAAACTAA ATOCGCTTTA TCATACAATT 780 

^^GTtACTACAT AOGATGAAAC 6ACTTCAGTA CAA6AAGTTG GAACTOkTTC CATTOAAGGC 840 

OGTACATOGA TTGACGGTTT CTTGATGTTT GATGATAATG A6AAATTATG GA6ACTGTTT 900 

GAC36GGGATA GTAAAGC6C6 TTTAOGTATC ATAAOGACTA ATTCAAACAT TC6TCCAAAT 960 

6AAOAAGTTA TGGTATnGG T6C6AATAAC G6AACAACTC AAACAATTGA GCTTAAGCTT 1020 

CXAACTAATA TTTCT G TT C G TGATACTGTT AAAATTTCCA TGAATTACAT GAGAAAA6GA 1080 

CAAACA6TTA AAATCAAMC TGCTGAtGAA GATAAAATTG CTTCTTCAGT TCAATTGCTG 1140 

CAATTCCCAA AA06CTCAGA ATATCCACCT GAAGCXGAAT GG6TTACMT TCAAGAATTA 1200 

GTTTTTAAOG ATGAAACTAA TTATGTTOCA GTTTTGGAGC TTGCTTACAT AGAAGATTCT 1260 

GATGGAAAAT ATTGGGTTGT ACA6CAAAAC GTTCCAACT6 TAGAAAGA6T AGATTCTTTA 1320 

AAtGATTCTA CTA8A6CM6 ATTA66€»TA ATTGCmAG CTACACIUM3C TCAAGCTAAT 1380 

GTOGARTAG AAAATTCTCC AGAAAAAGAA TTA6CAATTA CTCCMGAhAC GTTAGCTAAT 1440 

06TACTQCIA CAGAAACTCO GAGMSGTATT GCMOAATAG ChMTThCtGC TCAAGTGAAT 1500 

ChBMChCCK CKRCTCRT IGCTGATGAT AlTATCATCA CTCCTAAAAA 6CTGAATGAA 1560 

A6AACXGCSA CAGAAACTOG TAGAGGTGTC CCAGAAATTC CTAC6CAGCA AGAAACTAAT 1620 

GCAGGAACCG ATGAnCIAC AATCAT€ACT OCTAAAAAGC RCAAGCTCG TCAAGGTTCT 1680 

GAATCARAT CIGGTATT6T AAOCTTTGTA TCTACTGGAG GT6CTACTCC AGCTTCTA6C 1740 

OGTGAATTAA ATGGTAOGAA TGTTTATAAT AAAAACACTG ATAATTTA6T TGTTTCACCT 1800 

AAAGCTTTGG ATCAGTATAA AGCTACTCCA ACACAGGAAG GTGGAGTAAT TTTAGCA6TT 1860 

QAAAG7GAM TAATT6CTGG ACMUUrTCAG GAAGGATGGG CAAATGCTGT TGTAAC3GCCA 1920 

6AAA0GTTAC ATAAAAAGAC ATCAACIGAT GGAA6AATTG GTTTAATTGA AATTGCTAOG 1980 

CAAAGT6AAG TXAATACAGG AACIGATEAT ACTCGT6CA6 TCACTCCTAA AACTTTAAAT 2040 

GACCGTAGAG CMCTGAAAG TRAAGTGGT ATAGCTGAAA TTGCTACACA AGTTGAATTC 2100 
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GliCGCftGGCG 


TOGAOGATAC 


TCGTATCTCT 


ACACGATTAA 


AAATTAAAAC 


CA6ATTTAAT 
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AACCTACTT6 
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ACTGCAATAA 


GAG6TTTTGT 


TAAAACTTCA 
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TCTGGTTGAA 


TTACATTCGT 


TGGTAAT6AT 


AGAGTCXSGTT 


CTACCCAAGA 


TTTAGAACIG 


2580 


TAT6AGAAAA 


ATAGCTATGC 


G6TATCACGA 


TATGAATTAA 


ACCGTGTATT 


AGCAAATTAT 
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TT6CCACTAA 


AAGCAAAA6C 


TGCTGATACA 


AATTTATTGG 


AT6GTCTAGA 


TTCATCTCAG 
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TTGATT06TA 


GGGATATTGC 


ACMACGGTT 


AATGGTTCAC 


TAACCTTAAC 
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AATCT6AGTG 


CCCCTCTTGT 


ATCATCTAGT 


ACT6GTGAAT 


TTGGTGGTTC 
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AATAGAACAT 


TTACCATCCXS 


TAATACAGGA 
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GTATCGTTTT 


C6AAAAAGGT 


2880 
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ATGAOTATTC 
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2940 
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GTGATAOGAC 


C06TTC6ACA 


GTGTTTGAA6 


TTGGOGATCA 


CACATCTCAT 


3000 


CACTTTTATT 


CTCAAC6TAA 


TAMGACG6T 


AATATA60GT 


TTAACATTAA 


TGGTACTGTA 


3060 


ATGCCAATAA 


ACATTAATGC 


TTOOGGTTTG 


AT6AATGT6A 


ATGGCACT6C 


AACATTOGGT 


3120 


C6TTCAGTTA 


CAG0CAAT66 


TGAATTCATC 


AGCAMTCTG 


CAAATGCTTT 


TAGAGCAATA 


3180 


AA0G6TOATT 


AG66ATTCTT 


TATTCGTAAT 


GAT6CCTCTA 


ATACCTATTT 


TTTGCTCACT 


3240 


GGAGC06GTG 


ATCA6ACTGG 


T6GTTTTAAT 


GGATTA06CC 


CATTATTA21T 


TAATAATGAA 


3300 


TCXX36TCA6A 


TTACAATT66 


TGAAGGCTTA 


ATCATTCCCA 


AAGGTGTTAC 


TATAAATTCA 


3360 


QGC06TTTAA 


CTGTTAACTC 


GAGAATTGGT 


TCTCUGGGTA 


CTAAAACATC 


T6ATTTATAT 


3420 


ACCCOTGCCC 


CAACATCTCA 


TACIGTAGGA 


TTCIG6TCAA 


TC6ATATTAA 


TGATTCAGCC 


3480 


ACTTATAACC 


AGTTCCOOGG 


TTATTTTAAA 


ATGGTTGAAA 


AAACTAATGA 


A6TGACTGGG 


3540 


CncCATACT 
AACACACTT6 
A0CACT06CT 


TA6AA0GTG6 
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AC9CAAAMCT 


GTACACTGAC 


TCAGTTTGGT 
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3600 
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TTTTGTTCAG 


GTATTTGACG 


GAG6TAACCC 


TCCTCAACGA 


TCIGATATOG 


GT6CTTTACC 


ATCTGATAAT 


3780 


GCTACAATGG 


GGAATCTTAC 


TATTCGXQAT 


TTCTTOOGAA 


TTQGTAATGT 


TCGCATTGTT 


3840 


CSCT6ACCSCA6 


TGAATAAAAC 


G6TTAAATTT 


GAAT06GTTG 


AATAAGA66T 


ATTATGGAAA 


3900 


AATTTAT06C 


CGAGATTTGG 


ACM06ATAT 


6TCCAAA06C 


CATtTTATCG 


GAAAGTAATT 


3960 


CA6TAM3ATA 


tAAAATMGT 


AtAGOGGGTT 


CTTGOCCXpCT 


TTCTACAGCA 


GGACCATCAT 


4020 


ATGTTAAATT 


TCAGGATAAT 


CCTGTA6GAA 




TAGGCGCA6G 


CCTTCATTTA 


4080 


A6AGTTTTT6 


ACCCTTCCAC 


CGGAOCATTA 


GTT6ATAGTA 


AGTCATAT6C 


TTTTTCGACT 


4140 
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TOUUItTGATA 


CTACATGA6C 


TGCTTTTOTT AGTTTTCATG 


AATTCTTTGA OOAATAATGG 


4200 


AATTGTT6CT 


ATATTAACTA 


GTGGAAAG6T TAATTTTCCT 


CCTGAA6TAG TATCTTGGTT 


4260 


AKGMC06CC 


G6AA0GTCT6 


CCTTTCCATC T6ARCTATA 


TTGTCAAGAT TTGACGXATC 


4320 


ATATQCTGCT 


TTTTATACTT 


CTTCTAAAAG AGCTATCGCA 


TTA0AGCATG TTAAACTQA6 


4380 


TMTMMAA 


AGCACAGATG 


ATTATCAAAC TATTTTAGAT 


GTTGTATTTG ACAGTTTAGA 


4440 


A0ATGTAG6A 


GCTACCGG6T 


TTCCAAGAAG AA0GTAT6AA 


AGTGTTGAGC AATTCAT6TC 


4S00 


OOCACnGGT 


GGAACTAATA 


AC6AAATTGC GAGATTGCCA 


ACTTCAGCTG CTATAAGTAA 


4560 


ATTATCTGAT 


TATAATTTAA 


TTCCT66AGA TGTTCTTTAT 


CTTAAAGCTC AGTTATATGC 


4620 


TGAT0CT6AT 


TTACTTOCTC 


nOGAACTAC AAATATATCT 


ATCCGTTTTT ATAATGCATC 


4680 


TAAOGOATAT 


ATTTCTTCAA 


CAGAAGCTGA ATTTACT6GG 


CAA6CTGGGT CAT6GGAATT 


4740 


AAAGGAA6AT 


TATGTAGTTG 


TTCCAGAAAA OGCA6TAGGA 


TTTA06ATAT AOGCACAGAG 


4800 


AACTOCACAA 


6CTGGOCAAG 


6TGGCAT6A6 AAATTTAAGC 


TTTTCTGAAG TATCAAGAAA 


4860 




T0GAAAOCT6 


CTGAATTTGG 06TCAATGGT 


ATTCGTGTTA ATTATATCT6 


4920 


OQAATCCXvCT 


TCACCTCC6G 


ATATAATGGT ACTTCCTACG 


CAA6CATCGT CTAAAACTGG 


4980 


TAAAGT6TTT 


GGGCAAGAAT 


TTAGAGAAGT TTAAATTGAG 


GGACCCXTCG GGTTCCCTTT 


5040 


TTCTTTATAA 


ATACTATTCIA 


AATAAAGGGG GATACAAT6G 


CT6ATTTAAA AGTAGGTTCA 


5100 


ACAACTGGAG 


GCTCTGTCAT 


TTG6CATCAA G6AAATTTTC 


CATT6AATCC AGCGGGTGAC 


5160 


6ATGTACTCT 


ATAAATGATT 


TAAAATATAT TCAGAATATA 


ACAAACCACA AGCTGCTGAT 


5220 


AA0GATTTO6 


TTTCTAAAGC 


TAATGGTGGT ACTTATCCAT 


CAAAGGTAAC ATTTAAOGCT 


5280 


GGCATTGAAG 


TCXXSATATGC 


TOCAAACATC ATGAGCCCAT 


6CG6GATTTA TGGGGGTAAC 


5340 


GGT6ATGGTG 


CTACTTTTGA 


TAAAGCAAAT ATOGATATTG 


TTTCATGGTA TG6CX7TAGGA 


5400 


TTTAAATC6T 


CATTTGGTTC 


aiACAGGCCGA ACTCTTGTAA 


TTAATAGACG CAAT6GTGAT 


5460 


ATTAACACAA 


AAGGTGTTGT 


6T0GGCAGCT GGTOUUSTAA 


6AACTGGTGC GGCT6CTCCT 


5520 


ATAGGA006A 


ATGACCXTAC 


TAGAAA6GAC TAT6TTGATG 


GA6CAATMA TACTGTTACT 


5580 


GCAAAYGCAA 


ACtCTAOGGT 


GCXA06GTCT GGIGACACCA 


TGACAGGTAA TTTAACAGCG 


5640 


ccAAAcmr 


TCSTCGCAGAA 




AOGTTCCACG ATnGACCAA 


5700 


AXOGIAATTA 


AGOATTCTGT 


TCAAGAnrC GGCTATTATT 


AAGAGGACTT AIGGCTACTT 


5760 


TAAAACAAAT 


ACAATTTAAA 


AGAAGCAAAA TOGCAGGAAC 


ACGTCCTGCT 6CTTCAGTAT 


5820 






A&fMuivx. Ann «wiwnAfiw«ww# 


AAA AAA AnV A flAAMAAVAA A 




GAGGAAATAT 


CATCGATCTA 


GGTTTTGCTA AAGGOGGGCA 


AGTTGATGGC AACGTTACTA 


5940 


TTAAOGOACT 


RTGAGATTA 


AATGGCGATT ATGTACAAAC 


A6GTGGAAT6 ACTGTAAAOG 


6000 


6AC0CATT66 


TTCTACTGAT 


GGCGTCACTG GAAAAATTTT 


CAGATCTACA CAGGGTTCAT 


6060 


TT7ATGCAAC 


A6CAACAAAC 


GATACTTCAA ATGCCCATTT 


ATGGTTT6AA AATGCCGATG 


6120 


GGACT6AA06 


TGGCGTTATA 


TA7GCTCGCC CTCAAACTAC 


AACTGACGGT GAAATACGCC 


6180 
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TTAG6GTTAG HGMGCJUUTA GGAAGCHCTG CCAACAGTGA ATTCTATTTC OGCTCTATAA 6240 

ATG6AGG06A ATTTCAGGCT AACCGTATTT TA6CATCAGA TTCGTTAOTA ACAAAAOGCA 6300 

TTGOGGTTGA TACOGTTATT CATCATGCCA AA6GATTTG6 ACAATATGAT TCTCACTCTT 6360 

tGGTTAATTA TGTTTATCCT GGAAC060TG AAACAAATG6 TGTAAACTAT CTT0GTAAA6 6420 

TTOGOGCTAA 6TCCGGTGGT ACAATTTATC ATGAAATTCT TACTGCa^C^ ACAGGCCTG6 6480 

CKAT6AA6T TTCTTGOTCG TCIGGTGATA GACCAGTATT TAAACTATAC GGTATT06TG 6540 

A06ATG0CAQ AATGATTATC OOTAATAGCC TTOCATTAGG TACATTCACT ACAAATTTCC 6600 

0GTCTA6TGA TTATG6CAAC GTOOGTGTAA TGGGOGATAA GTATCTTGTT CTCGGC6ACA 6660 

CTGTAACXGO CTTGTCATAC AAAAAAACTO GT6TATTT0A TCTAGTTGGC 6CTCGATATT 6720 

CT G TT G C TT C TATTACTCCT GACAGTTTCC GTAGTACTCG TAAAGGTATA TTTCGTCGTT 6780 

CTGAG6ACGA A6GC6CAACT T6GATAATGC CTGGTACAAA TGCTGCTCTC TTGTCTGTTC 6840 

AAACACAAGC T6ATAATAAC AATGCTGGAG A06GACAAAC CCATATOG6G TACAAT6CTG 6900 

GOGGTAAAAT GAACGACTAT TTCOGTGGTA CAG6TCAGAT GAATATCAAT AOCCAACAAG 6960 

GTATGGAAAT TAACCC3GGGT ATTTTGAAAT TGGTAACT6G CTCTAATAAT GTACAATTTT 7020 

AOGCTGACSGG AACTATTTCT TCCATTCAAC CTATTAAATT AGATAAOGA6 ATATTTTTAA 7080 

CTAAATCTAA TAATACTGCG GGTCnAAAT rTGGAGCTCC TAGCCAAOTT GATGGCACAA 7140 

GGACTATCCA AT6GAA0GGT G6TACT0GCG AA06ACAGAA TAAAAACTAT GTGATTATTA 7200 

AAGCATGGG6 TAACTCATTT AAT60CACTG 6TGATA6ATC TCGCGAAA06 CTTTTCCAA6 7260 

TATCAGATA6 TGAAGGATAT TATTTTTATO CTCATCWTAA A6CTCCAACC GGOOACGAAA 7320 

CTATTGGACG TATTGAA6CT CAATTTGCTG 66GATGTTTA TGCTAAAGGT ATTATTGCCA 7380 

A06GAAATTT TA6AGTTGTT GG6TCAMG6 CTTTA60066 CAATGTTACT ATGTCTAACG 7440 

O TTT G TTT G T CCAA0G7GGT TCTTCTATTA CTGGACAAGT TAAAATTGGC GGAACAGCM 7500 

AOGCACTGAG AATTTG6AAC 6CT6AATATG GTGCTATrTT CCGTCGTTCO GAAA6TAACT 7560 

TRATATTAT TOCAACCAAT CAAAATGAA6 GA6AAAGTG6 AGAGATTCAC AGCTCTTTGA 7620 

GAOCX6TGAG AATAGGATTA AAC36A1GGCA T G GW GGG T T A6GAAGAGAT TCTTTTATA6 7680 

TAGATGAAAA TAATGCTTTA ACXACGA7AA ACA6TAACTC TC6CATTAAT Ga»ACTTTA 7740 

GAATGCAATT GGG6CAGTG6 6GAXACATTO AtGCMAATG TACT6ATGCT GnOGCOOGG 7800 

OGGGTOCAOG TTCATTTGCT TCCCAGAATA ATGAAGACGT OCGT6CGC0G TPCTATATGA 7860 

ATATTGATAG AACTGATCCT AGTGCATAT6 TTCCTATTTT GAAACJUiCOT TATGTTCAAG 7920 

GCAATGGCTG CTATTCATTA GGGACTTTAA TTAATAATCG TAATTTCC6A GTTCATTACC 7980 

ATGGG6G0GG AGATAA0G6T TCTACACGTC CACAGACTGC TGATTTrGGA TGGGAATTTA 8040 

TTAAAAAC5GG TGATTMATT TCACGTOGCG ATTTAATAGC AGGCMAGTC AGATTTGATA 8100 

GAACTG6TAA TATCACT6GT GGTTCtGGTA ATTnOCTAA CTTAAACAGT ACAATTOAAT 8160 

CACTTAAAAC TGATATCAT6 TC6AGTTACC CAATtGGTGC TCCGATTCCT TGGCOGAGTG 8220 
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ATTCMTTCC TGCTOGATTT GCTTT6ATGG AAGGTCAGAC CTTTGATAAG TCC6CATATC 8280 

GAAAGTTAGC TGR6CATAT CCTAGCGGTG TTATTOCAGA TAT6C60GG6 CAAACTATCA 8340 

AOQGTAAAOC AA6TGGT0GT OCTGTTmA GCGCTGAGOC AGATGGT6TT AAGGCTCATA 8400 

GOCATAOTOC ATOGGCTTCA AGTACTGACT TA66TACTAA AACCACATCA A6CTTTGACT 8460 

ATGGTA06AA GG6AACTAAC AGTA0GG6T6 GACAGACTCA CTCT6GTAGT GGTTCTACTA 8520 

GCACAAATGG TGA6CAGA6C CACTACATC6 AGGCATGGAA T6GTACTGGT GTA66TG6TA 8580 

ATAA0ATGTC ATCATAtGOC ATATCATACA GGG0G6GTGG GAGTAACACT AATGCAGCAG 8640 

GGAACCMAG TCACACTTTC TCTTTTGGGA CTAGC3IGTCC T66C6ACCAT TCXX^iCTCTG 8700 

TAGGTATTGG tGCTGATACC CACA06GTA0 CAATTGGATC ACATGGTCAT ACTATCACTG 8760 

TAAATAGTAC A6GTAATACA GAAAACA06G TTAAAAACAT TGCTTTTAAC TATATCGTTC 8820 

GTTTAGCATA AGGAGAGGGG CTTOGGCCCT TCTAA 8855 
(2) mPORKATION FOR SEQ IP liOs2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1289 amino acids 

(B) TYPE I amino acid 
(D) TOPOLOGY: linaar 

(ii) MOLECULE TYPE: protein 

(Vi) ORIGINAL SOURCE t 

(A) ORGANISMS Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

<B) CLONE: p34 amino acid 

<xi) SEQUENCE DESCRIPTION: SEQ ID NOs2: 

Met Ala Glu He Lys Arg Glu Phe Arg Ala Glu Asp Gly Leu Asp Ala 

15 10 15 

Gly Gly Asp Lys He He Asn Val Ala Leu Ala Asp Arg Thr Val Gly 
20 25 30 

Thr Asp Qly Val Asn Val Asp Tyr Leu He Gin Olu Asn Thr Val Gin 
35 40 45 

Gin Tyr Asp Pro Thr Arg Gly Tyr Leu Lys Asp Pha Val He He Tyr 
50 55 60 

Asp Asn Arg Phe Trp Ala Ala He Asn Asp He Pro Lys Pro Ala Gly 

65 70 75 80 

Ala Phe Asn Ser Gly Arg Trp Arg Ala Leu Arg Thr Asp Ala Asn Trp 
85 90 95 

He Thr Val Ser Ser Gly Ser Tyr Gin Leu Lys Ser Gly Glu Ala He 
100 105 110 

Ser Val Asn Thr Ala Ala Gly Asn Asp He Thr Phe Thr Leu Pro Ser 
115 120 125 

Ser Pro He Asp Gly Asp Thr He Val Leu Gin Asp He Gly Gly Lys 
130 135 140 
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Pro 6ly Vftl Asn Gin Val Leu He Val Ala Pro Val Gin Ser lie Val 

145 150 155 160 

hmn Phe Krg Gly Glu Gin Val Arg Ser Val Leu Met Thr His Pro Lye 
165 170 175 

Smr Gin Leu Val Leu lie Phe Ser Asn Arg Leu Trp Gin Met Tyr Val 

180 185 190 

Ala Asp Tyr Ser Arg Glu Ala He Val Val Thr Pro Ala Asn Thr Tyr 
195 200 205 

Gin Ala Gin Ser Asn Asp Phe lie Val Arg Arg Phe Thr Ser Ala Ala 

210 215 220 

Pro iXe Asn Val Lys Leu Pro Arg Phe Ala Asn His Gly Asp He He 
225 230 235 240 

Asn Phe Val Asp Leu Asp Lys Leu Asn Pro Leu Tyr His Thr He Val 

245 250 255 

Thr Thr Tyr Asp Glu Thr Thr Ser Val Gin Glu Val Gly Thr Hie Ser 
260 265 270 

He Glu Gly Arg Thr Ser He Asp Gly Phe Leu Met Phe Asp Asp Asn 

275 280 285 

Glu Lys Leu Trp Arg Leu Phe Asp Gly Asp Ser Lys Ala Arg Leu Arg 
290 295 300 

He He Thr Thr Asn Ser Asn He Arg Pro Asn Glu Glu Val Met Val 

305 310 315 320 

Phe Gly Ala Asn Asn Gly Thr Thr Gin Thr He Glu Leu Lys Leu Pro 
325 330 335 



Thr Aen He Ser Val Gly Asp Thr Val Lys He Ser Met Asn Tyr Met 

340 345 350 

Arg Lys Gly Gin Thr Val Lys He Lys Ala Ala Aep Glu Asp Lys He 
355 360 365 

Ala Ser Ser Val Gin Leu Leu Gin Phe Pro Lys Arg Ser Glu Tyr Pro 

370 375 380 

Pro Glu Ala Glu Trp Val Thr Val Gin Glu Leu Val Phe Asn Asp Glu 
385 390 395 400 

Thr Aen Tyr Val Pro Val Leu Glu Leu Ala Tyr He Glu Aep Ser Asp 
405 410 415 

Gly Lys Tyr Trp Val Val Gin Gin Asn Val Pro Thr Val Glu Arg Val 
420 425 430 

Asp Ser Leu Asn Asp Ser Thr Arg Ala Arg Leu Gly Val He Ala Leu 
435 440 445 

Ala Thr Gin Ala Gin Ala Asn Val Asp Leu Glu Asn Ser Pro Gin Lys 
450 455 460 

Olu Leu Ala He Thr Pro Glu Thr Leu Ala Asn Arg Thr Ala Thr Glu 
465 470 475 480 

Thr Arg Arg Gly He Ala Arg He Ala Thr Thr Ala Gin Val Asn Gin 
485 490 495 

Asn Thr Thr Phe Ser Phe Ala Asp Asp He He He Thr Pro Lys Lys 
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500 



505 



510 



Lmu Asn Glu hrg Thr Ala Thr Glu Thr Arg Arg Gly Val Ala Glu lie 
515 520 525 

Ala Thr Gin Gin Glu Thr Asn Ala Gly Thr Asp Asp Thr Thr He He 
530 535 540 

Thr Pro Lys Lys Leu Gin Ala Arg Gin Gly Ser Glu Ser Leu Ser Gly 
545 550 555 560 

He Val Thr Phe Val Ser Thr Ala Gly Ala Thr Pro Ala Ser Ser Arg 
565 570 575 

Glu Leu Aen Gly Thr Aan Val Tyr Asn Lye Asn Thr Asp Asn Leu Val 

580 585 590 

Val Ser Pro Lys Ala Leu Asp Gin Tyr Lys Ala Thr Pro Thr Gin Gin 
595 600 605 

Gly Ala Val He Leu Ala Val Glu Ser Glu Val lis Ala Gly Gin Ser 

610 615 620 

Gin Gin Gly Trp Ala Asn Ala Val Val Thr Pro Glu Thr Leu His Lys 
625 630 635 640 

Lys Thr Ser Thr Asp Gly Arg He Gly Leu He Glu He Ala Thr Gin 

645 650 655 

Ser Glu Val Asn Thr Gly Thr Asp Tyr Thr Arg Ala Val Thr Pro Lys 
660 665 670 

Thr Leu Asn Asp Arg Arg Ala Thr Glu Ser Leu Ser Gly He Ala Glu 
675 680 685 

He Ala Thr Gin Val Glu Phe Asp Ala Gly Val Asp Asp Thr Arg He 
690 695 700 

Ser Thr Pro Leu Lys He Lys Thr Arg Phe Asn Ser Thr Asp Arg Thr 
705 710 715 720 

Ser Val val Ala Leu ser Gly Leu Val Glu Ser Gly Thr Leu Trp Asp 
725 730 735 

Bis Tyr Thr Leu Asn He Leu Glu Ala Asn Glu Thr Gin Arg Gly Thr 
740 745 750 

Leu Arg Val Ala Thr Gin Val Glu Ala Ala Ala Gly Thr Leu Asp Asn 
755 760 765 

Val Leu He Thr Pro Lye Lys Leu Leu Gly Thr Lys Ser Thr Glu Ala 

770 775 780 

Gin Glu Gly Val He Lys Val Ala Thr Gin Ser Glu Thr Val Thr Gly 
785 790 795 800 

Thr Ser Ala Asn Thr Ala Val Ser Pro Lys Asn l^eu Lys Trp He Ala 
805 610 815 

Gin Ser Glu Pro Thr Trp Ala Ala Thr Thr Ala He Arg Gly Phe Val 
820 825 830 

Lys Thr Ser Ser Gly Ser He Thr Phe Val Gly Asn Asp Thr Val Gly 
835 840 845 

Ser Thr Gin Asp Leu Glu Leu Tyr Glu Lys Asn Ser Tyr Ala Val Ser 



850 



855 



860 
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Pro Tyr Glu X.eu Asn Arg Val Leu Ala Asn Tyr l.eu Pro Leu Lys Ala 
865 370 875 880 

Ly« Ala Ala Asp Thr Aen Leu Leu Asp Gly Leu Asp Ser Ser Gin Phe 
885 890 895 

He Arg Arg Aep He Ala Gin Thr Val Asn Gly Ser Leu Thr Leu Thr 

900 905 910 

Gin Gin Thr Aen Leu Ser Ala Pro Leu Val ser Ser Ser Thr Gly Glu 
915 920 525 

Phe Gly Gly Ser Leu Ala Ala Aen Arg Thr Phe Thr He Arg Asn Thr 

930 935 940 

Gly Ala Pro Thr Ser lie Val Phe Glu Lye Gly Pro Ala Ser Gly Ala 
945 950 955 960 

Asn Pro Ala Gin Ser Met Ser He Arg Val Trp Gly Aen Gin Phe Gly 
965 970 975 

Gly Gly Ser Aep Thr Thr Arg Ser Thr Val Phe Glu Val Gly Aep Asp 
980 985 990 

Thr ser His His Phe Tyr Ser Gin Arg Aen Lys Asp Gly Asn He Ala 

995 1000 1005 

Phe Asn He Asn Gly Thr Val Met Pro He Asn He Aon Ala Ser Gly 
1010 1015 1020 

Leu Met Asn Val Asn Gly Thr Ala Thr Phe Gly Arg Ser Val Thr Ala 
1025 1030 1035 1040 

Asn Gly Glu Phe He Ser Lys Ser Ala Asn Ala Phe Arg Ala He Asn 
1045 1050 1055 

Gly Asp Tyr Gly Phe Phe He Arg Asn Asp Ala Ser Asn Thr Tyr Phe 
1060 1065 1070 

Leu Leu Thr Ala Ala Gly Asp Gin Thr Gly Gly Phe Asn Gly Leu Arg 
1075 1080 1085 

Pro Leu Leu He Asn Asn Gin Ser Gly Gin He Thr He Gly Glu Gly 
1090 1095 1100 

Lea He He Ala Lys Gly Val Thr He Asn Ser Gly Gly Leu Thr Val 
1105 1X10 1H5 1120 

ASD Ser Arg He Arg Ser Gin Gly Thr Lye Thr Ser Asp Leu Tyr Thr 
^ 1125 1130 1135 

Arg Ala Pro Thr Ser Aep Thr Val Gly Phe Trp Ser He Asp He Asn 
1140 1145 1150 

ASP ser Ala Thr Tyr Asn Gin Phe Pro Gly Tyr Phe Lys Met Val Glu 
1155 1160 1165 

Lys Thr Asn Glu Val Thr Gly Leu Pro Tyr Leu du Arg Gly Glu Glu 
' 1170 1175 1180 

Val Lys ser Pro Gly Thr Leu Thr Gin Phe Gly Asn Thr Leu Asp Ser 
1185 1190 1195 1200 

Leu Tyr Gin Asp Trp He Thr Tyr Pro Thr Thr Pro Glu Ala Arg Thr 
1205 1210 1215 

Thr Arg Trp Thr Arg Thr Trp Gin Lye Thr Lys Aen Ser Trp Ser Ser 
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1220 1225 1230 

Phe Val ein Val Phe Asp Oly Gly Asn Pro Pro Gin Pro Ser Asp He 
1235 1240 1245 

^ocn'*" **P A-n 1«« Thr He Ara 



•.>w wxy n»ii 1 

"50 1255 1260 

Pro Val 

1280 



A.p Phe Leu Arg He Gly Asn Val Arg He Val Pro Asp Pro Val Asn 
1265 1270 1275 

Lys Thr Val Lys Phe Glu Trp Val Glu 
1285 

(2) INFORMATION FOR SSQ ID NOs3s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCES 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONES ORF X amino acid 

(Xi) SEQUENCE DBSCRIPTZONs SEQ ID N0s3: 

Met Glu Lys Phe Net Ala Glu He Trp Thr Arg He Cys Pro Asn Ala 

^5 10 15 

He Leu Ser Glu Ser Asn Ser Val Arg Tyr Lys He Ser He Ala Gly 
20 25 30 

Ser Cys Pro Leu Ser Thr Ala Gly Pro Ser Tyr Val Lys Phe Gin Asp 
35 40 45 

Asn Pro Val Gly Ser Gin Thr^Phe Arg Arg Arg Pro Ser Phe Lys Ser 
50 55 5Q 

Phe 

65 

(2) INFORMATim TOR SEQ ID NOs4: 

(1) SEQUENCE CNARACIBRISTICS: 

CA) LENGTHS 295 amino acids 
(B) TYPES enino acid 
(D) TOmoCY: linear 

(ii) MOLECULE TYPES protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p35 amino acid 

(xi) SBQUENCnS DESCRIPTION: SEQ ID NO: 4: 

Met Leu Phe Arg Leu Gin Met He Leu His Gin Leu Leu Leu Leu Val 
^5 10 15 
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PIm Itet Asn Ser Leu Thr Aan Asn Arg lie Val Ala lie l.eu Thr Ser 
20 25 30 

Cly Lys Val Asn Phe Pro Pro Glu Val val Ser Trp Leu Arg Thr Ala 
35 40 45 

Gly Thr Ser Ala Phe Pro Ser Asp Ser He Leu Ser Arg Phe Asp Val 
50 55 60 

Ser Tyr Ala Ala Phe Tyr Thr Ser Ser Lys Arg Ala He Ala Leu Glu 
65 70 75 80 

His Val Lys Leu Ser Asn Arg Lys Ser Thr Asp Asp Tyr Gin Thr He 

85 90 95 

Leu Asp Val Val Phe Asp Ser Leu Glu Asp Val Gly Ala Thr Gly Phe 
100 105 HO 

Pro Arg Arg Thr Tyr Glu Ser Val Glu Gin Phe Met Ser Ala Val Gly 
115 120 125 

Gly Thr Asn Asn Glu He Ala Arg Leu Pro Thr Ser Ala Ala He Ser 
130 135 140 

Lys Leu Ser Asp Tyr Asn Leu He Pro Gly Asp Val Leu Tyr Leu Lys 

145 150 155 160 

Ala Gin Leu Tyr Ala Asp Ala Asp Leu Leu Ala Leu Gly Thr Thr Asn 
165 170 175 

He Ser He Arg Phe Tyr Asn Ala Ser Asn Gly Tyr He Ser Ser Thr 
180 185 190 

Gin Ala Glu Phe Thr Gly Gin Ala Gly Ser Trp Glu Leu Lys Glu Asp 
195 200 205 

Tyr Val Val Val Pro Glu Asn Ala Val Gly Phe Thr He Tyr Ala Gin 
210 215 220 

Arg Thr Ala Gin Ala Gly Gin Gly Gly Met Arg Asn Leu Ser Phe Ser 
225 230 235 240 

Gla Val ser Arg Asn Gly Gly He Ser Lys Pro Ala Glu Phe Gly Val 
245 250 255 

Asn Gly He Arg Val Asn Tyr He Cys Glu Ser Ala Ser Pro Pro Asp 
260 265 270 

He Met Val Leu Pro Thr Gin Ala Ser Ser Lys Thr Gly Lys Val Phe 
275 280 285 

Gly 61n Glu Phe Arg Glu Val 
290 295 

(2) mPORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino aoid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGAMISMt Bacteriophage T4 

(vii) IMMEDIATE SOTOCE: 
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(B) CLdlE: p36 amino acid 

(Xi) 8BQ0ENCB DESCRIPTIONS SEQ ID NOsSs 

Itet Ala Asp L0U hyB Val Gly Smt Thr Thr 61y Gly Ser Val He Trp 
1 5 10 15 

Hia Gin Gly Asn Phft Pro Lau Asn Pro Ala Gly Aap Asp Val Lau Tyr 

20 25 30 

Lya Sar Phe Lya He Tyr Ser Glu Tyr Aan Lya Pro Gin Ala Ala Aap 
35 40 45 

Aan Aap Phe Val Ser Lya Ala Aan Gly Gly Thr Tyr Ala Ser Lya Val 

50 55 60 

Thr Phe Aan Ala Gly He Gin Val Pro Tyr Ala Pro Aan He Met Ser 
65 70 75 80 

Pro Cya Gly He Tyr Gly Gly Aan Gly Aap Gly Ala Thr Phe Aap Lya 
85 90 95 

Ala Aan He Aap He Val Ser Trp Tyr Gly Val Gly Phe Lya Ser Ser 
100 105 110 

Phe Gly Ser Thr Gly Arg Thr Val Val He Aan Thr Arg Aan Gly Aap 
115 120 125 

He Aan Thr Lya Gly Val Val Ser Ala Ala Gly Gin Val Arg Ser Gly 
130 135 140 

Ala Ala Ala Pro He Ala Ala Aan Aap Leu Thr Arg Lya Aap Tyr Val 

145 150 155 160 

Aap Gly Ala He Aan Thr Val Thr Ala Aan Ala Aan Ser Arg Val Leu 
165 170 175 

Arg Ser Gly Aap Thr tfet Thr Gly Asn Leu Thr Ala Pro Aan Phe Phe 
180 185 190 

Ser Gin Aan Pro Ala Ser Gin Pro Ser Hia val Pro Arg Phe Aap Gin 
195 200 205 

He Val He Lya Aap Ser Val Gin Aap Phe Gly Tyr Tyr 
210 215 220 

(2) INFORMATION FOR SEQ ID NOtSt 

(1) SEQOENCS CBMRACTBRISTXCS: 

(A) LENGTH: 1026 aaiino aelda 
CB| TYPES amino acid 
(D) TOPOLOGTs linear 

<ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) (XRGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p37 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Ala Thr Leu Lya Gin He Gin Phe Lya Arg Ser Lya He Ala Gly 
15 10 15 
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Thr hrg Pro Ala Ma Ser Val Leu Ala Glu Gly Glu Leu Ala He Aen 
20 25 30 

Leu Lys Asp Arg Thr He Phe Thr Lya Asp Asp Ser Gly Asn He He 
35 40 45 

Asp Leu Gly Phe Ala Lys Gly Gly Gin Val Asp Gly Asn Val Thr He 
50 55 60 

Asn Gly Leu Leu Arg Leu Asn Gly Asp Tyr Val Gin Thr Gly Gly Met 
65 70 75 80 

Thr Val Asn Gly Pro He Gly Ser Thr Asp Gly Val Thr Gly Lys He 
85 90 95 

Phe Arg Ser Thr Gin Gly Ser Phe Tyr Ala Arg Ala Thr Asn Asp Thr 
100 105 110 

Ser Asn Ala His Leu Trp Phe Glu Asn Ala Asp Gly Thr Glu Arg Gly 

115 120 125 

val He Tyr Ala Arg Pro Gin Thr Thr Thr Asp Gly Glu He Arg Leu 
130 135 140 

Arg Val Arg Gin Gly Thr Gly Ser Thr Ala Asn Ser Glu Phe Tyr Phe 
145 150 155 160 

Ara Ser He Asn Gly Gly Glu Phe Gin Ala Asn Arg He Leu Ala Ser 
165 170 175 

Asp Ser Leu Val Thr Lys Arg He Ala Val Asp Thr Val He His Asp 
180 185 190 

Ala Lys Ala Phe Gly Gin Tyr Asp Ser His Ser Leu Val Aan Tyr Val 
195 200 205 

Tyr Pro Gly Thr Gly Glu Thr Asn Gly Val Asn Tyr Leu Arg Lys Val 
210 215 220 

Arg Ala Lys Ser Gly Gly Thr He Tyr His Glu He Val Thr Ala Gin 
225 230 235 240 

Thr Cly Leu Ala Asp Glu Val Ser Trp Trp Ser Gly Asp Thr Pro Val 

245 250 255 

Phe Lys Leu Tyr Gly He Arg Asp Asp Gly Arg Met He He Arg Asn 
260 265 270 

Ser Leu Ala Leu Gly Thr Phe Thr Thr Asn Phe Pro Ser Ser Asp Tyr 
275 280 285 

Gly Asn Val Gly Val Met Gly Asp Lys Tyr Leu Val Leu Gly Asp Thr 
290 295 300 

Val Thr Gly Leu Ser Tyr Lys Lys Thr Gly Val Phe Asp Leu Val Gly 

305 310 315 320 

Gly Gly Tyr Ser Val Ala Ser He Thr Pro Asp Ser Phe Arg Ser Thr 
325 330 335 

Aro Lys Gly He Phe Gly Arg Ser Glu Asp Gin Gly Ala Thr Trp He 
^ ' 340 345 350 

Met Pro Gly Thr Asn Ala Ala Leu Leu Ser Val Gin Thr Gin Ala Asp 
355 360 365 

Asn Asn Asn Ala Gly Asp Gly Gin Thr His He Gly Tyr Asn Ala Gly 
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370 



375 



380 



Gly Lys Met; Kmn His Tyr Phe Arg Gly Xhr Gly Gin Met Aen lie Asn 
385 390 395 400 

Thr Gin Gin Gly Met Glu He Asn Pro Gly He Leu Lys Leu Val Thr 
405 410 415 

Gly ser Asn Asn Val Gin Phe Tyr Ala Asp Gly Thr He Ser Ser He 
420 425 430 

Gin Pro He Lys Leu Asp Asn Glu He Phe Leu Thr Lys Ser Asn Asn 
435 440 445 

Thr Ala Gly Leu Lys Phe Gly Ala Pro ser Gin Val Asp Gly Thr Arg 
450 455 460 

Thr He Gin Trp Asn Gly Gly Thr Arg Glu Gly Gin Asn Lys Asn Tyr 
465 470 475 480 

Val He He Lys Ala Trp Gly Asn Ser Phe Asn Ala Thr Gly Asp Arg 
485 490 495 

Ser Arg Glu Thr Val Phe Gin Val Ser Asp Ser Gin Gly Tyr Tyr Phe 
500 505 510 

Tyr Ala His Arg Lys Ala Pro Thr Gly Asp Glu Thr He Gly Arg He 

515 520 525 

Glu Ala Gin Phe Ala Gly Asp Val Tyr Ala Lys Gly He He Ala Asn 
530 535 540 

Gly Asn Phe Arg Val Val Gly Ser Ser Ala Leu Ala Gly Asn Val Thr 

545 550 555 560 

Met Ser Asn Gly Leu Phe val Gin Gly Gly Ser Ser He Thr Gly Gin 
565 570 575 

Val Lys He Gly Gly Thr Ala Asn Ala Leu Arg He Trp Asn Ala Glu 
580 585 590 

Tyr Gly Ala He Phe Arg Arg Ser Glu Ser Asn Phe Tyr He He Pro 
595 600 605 

Thr Asn Gin Aen Glu Gly Glu Ser Gly Asp He His Ser Ser Leu Arg 
610 615 620 

Pro Val Arg He Gly Leu Asn Asp Gly Met Val Gly Leu Gly Arg Asp 
625 630 635 640 

Ser Phe He Val Asp Gin Asn Asn Ala Leu Thr Thr He Asn Ser Asn 

645 650 655 

Ser Arg He Asn Ala Asn Phe Arg Met Gin Leu Gly Gin Ser Ala Tyr 
660 665 670 

He Asp Ala Glu Cys Thr Asp Ala Val Arg Pro Ala Gly Ala Gly Ser 

675 680 685 

Phe Ala Ser Gin Asn Asn Glu Asp Val Arg Ala Pro Phe Tyr Met Asn 
690 695 700 

He Asp Arg Thr Asp Ala Ser Ala Tyr Val Pro He Leu Lys Gin Arg 

705 710 715 720 

Tyr Val Gin Gly Aen Gly Cys Tyr Ser Leu Gly Thr Leu He Asn Asn 
725 730 735 
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Gly hmn Phe Arg Val His Tyr His Gly Gly Gly Asp Asn Gly Ser Thr 
740 745 750 

Gly Pro Gin Thr Ala Asp Phe Gly Trp Glu Phe lie Lys Asn Gly Asp 
755 760 765 

Phe He Ser Pro Arg Asp tm\x lie Ala Gly Lys Val Arg Phe Asp Arg 
770 775 780 

Thr Gly Asn He Thr Gly Gly Ser Gly Asn Phe Ala Asn I^u Asn Ser 
785 790 795 800 

Thr He Glu Ser Leu Lys Thr Asp He Met Ser Ser Tyr Pro He Gly 
805 810 815 

Ala Pro He Pro Trp Pro Ser Asp Ser Val Pro Ala Gly Phe Ala Leu 
820 825 830 

Met Glu Gly Gin Thr Phe Asp Lys Ser Ala Tyr Pro Lys Leu Ala Val 
835 840 845 

Ala Tyr Pro Ser Gly Val He Pro Asp Met Arg Gly Gin Thr He Lys 
850 855 860 

Gly Lys Pro Ser Gly Arg Ala Val Leu Ser Ala Glu Ala Asp Gly Val 
865 870 875 880 

Lys Ala His Ser His Ser Ala Ser Ala Ser Ser Thr Asp Leu Gly Thr 
885 890 895 

Lys Thr Thr ser Ser Phe Asp Tyr Gly Thr Lys Gly Thr Asn Ser Thr 
900 905 910 

Gly Gly His Thr His Ser Gly Ser Gly Ser Thr Ser Thr Asn Gly Glu 
915 920 925 

His Ser His Tyr He Glu Ala Trp Asn Gly Thr Gly Val Gly Gly Asn 
930 935 940 

Lys Met Ser Ser Tyr Ala He Ser Tyr Arg Ala Gly Gly Ser Asn Thr 
945 950 955 960 

Asn Ala Ala Gly Asn His Ser His Thr Phe Ser Phe Gly Thr Ser Ser 

965 970 975 

Ala Gly Asp His Ser His Ser Val Gly He Gly Ala His Thr His Thr 
980 985 990 

Val Ala He Gly Ser His Gly His Thr He Thr Val Asn Ser Thr Gly 
995 lOOO 1005 

Asn Thr Glu Asn Thr Val Lye Asn He Ala Phe Asn Tyr He Val Arg 
1010 1015 1020 

Leu Ala 

1025 
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What la clatmftd iat 

1. An isolated polypeptide consisting essentially 
of the gp37 tail fiber protein of bacteriophage T4 lacking 
S amino acids 99-496 (SEQ ID NO: 6) when nujibered from the amino 
terminus, wherein said polypeptide has the capability to form 
dimers and interact with the P36 protein oligomer of 
bacteriophage T4. 

10 2. An isolated polypeptide consisting essentially 

of a fusion protein between the gp36 and gp37 proteins of 
bacteriophage T4, wherein amino acid residues 1-242 of gp37 
(SEQ ID NO: 6) are fused in proper reading frame to amino acid 
residues 118-221 of gp36 (SEQ ID NO:5). 

IS 

3. Olhe polypeptide of claim 2 wherein a plurality 
of carboxy termini of said polypeptide have the capability of 
interacting with the amino terminus of the P37 protein 
oligcaMr of bacteriophage T4 and to form an attached oligomer 

20 and the amino termini of the oligomer of said polypeptide 
have the capability of interacting with the carboxy termini 
of gp36 polypeptides of bacteriophage T4. 

4. An isolated polypeptide oligcMfter consisting 
2S essentially of two gp37 polypeptides of bacteriophage T4, 

wherein the amino termini of said oligoaier ladk the 
capability of interacting with the carboxy termini of gp36 
polypeptides of bacteriophage T4. 

30 5. An isolated polypeptide oligomer consisting 

essentially of the P37 protein of bacteriophage T4, wherein 
the amino termini of said oligomer lack the capability of 
interacting with the carboxy termini of gp36 polypeptides of 
bacteriophage T4. 

35 

6. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 
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said polypeptide lacks the capability of Interacting with the 
aaino terminus of the P37 protein oligomer of bacteriophage 
T4. 

5 7. An Isolated polypeptide consisting essentially 

of a fusion protein between the gp36 and gp34 proteins of 
bacterioidiage T4, vhereln amino acid residues 1-73 of gp36 
(SBQ ID MO: 5) are fused in proper reading frame 
amino*^terminal to amino acid residues 866*1289 of gp34 (SEQ 
10 ZD llO:2). 

8. An oligomer of the polypeptide of claim 7, 
wherein the amino termini of said dimer have the capability 
of interacting with the gp35 protein of bacteriophage T4. 

15 

9. An Isolated polypeptide consisting essentially 
of a variant of the gp35 protein of bacteriophage T4, wherein 
said polypeptide forms an angle of less than about 125 « when 
combined with the P34 and P36-P37 protein oligomers of 

ao bacteriophage T4, \inder conditions wherein the wild-type gp35 
protein forms an angle of 137* when combined with said 
oligomers. 

10. An isolated polypeptide consisting essentially 
25 of a variant of the gp35 protein of bacteriophage T4, wherein 

said polypeptide forms an angle of more than about 145*^ when 
combined with the P34 and P36-P37 protein oligomers of 
bacteriofdiage T4, under conditions wherein the wild-type gp35 
protein forms an angle of 137* when combined with said 
30 oligomers* 

11. An isolated polypeptide consisting essentially 
of a variant of the gp35 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the P34 protein 

35 oligomer of bacteriophage T4 is unstable at temperatures 
between about 40*C and about 60*C* 
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12. An isolated polypeptide olig<»er consisting 
essentially of a variant of the P37 protein of bacteriophage 
T4, wherein the interaction of said oligomer with the P36 
protein oligomer of bacteriophage T4 is unstable at 

S te^peratores between about 40»c and about 60»c. 

13. An isolated polypeptide oligomer consisting 
essentially of a variant of the P37 protein of bacteriophage 
T4, whureln the carboxy-terminal domain of said oligomer is 

10 modified so as to confer the ability of the entire 

polypeptide to bind specifically to an immobilized ligand. 

14. The polypeptide of claim 13, wherein said 
ligand is selected from the group consisting of biotin, 

15 immunoglobulin, or divalent metal ions. 

15. A nanostructure comprising a plurality of 
fusion proteins, said fusion proteins conprising a first 
portion consisting of at least the first lO N-terminal amino 

ao adds Of a tall fiber protein fused via a peptide bond to a 
second portion consisting of at least the last 10 C-terminal 
amino acids of a second tail fiber protein, wherein the tail 
fiber proteins are selected from the group consisting of 
gp34, gp35, gp36, and gp37 proteins of a T-even-llke 

25 bacteriophage, wherein the first and second tall fiber 
proteins are the same or different. 



30 



16. The nanostructure of claim 15, wherein the 
first and second tall flb«r proteins are different. 



17. The nanostructure of claim 15, which further 
comprises a molecule that can self-assemble into a dimer or 
trimer, fused to at least a 10 amino add portion of a 
T-even-like tall fiber protein. 

35 

18. The nanostructure of claim 17, wherein the 
nolecule has the structure of a leucine zipper. 
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19. The nanostructure of claim 15 r wherein said 
nanostructure coapriees a linear one^inensional rod. 

20. The nanostructure of claim 15, wherein said 
S nanostructure coaq^irises a polygon* 

21. The nanostructure of claim 15, wherein said 
nanostructure comprises a three«»dimenslonal cage or solid. 

10 22. The nanostructure of claim 15, wherein said 

nanostructure comprises a two-dimensional open or closed 
sheet • 

23. An isolated fusion protein consisting 

15 essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10-60 
N-terminal amino acids of the gp37 protein fused to a second 
portion of a gp36 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 C-terminal amino acids 

20 of the gp36 protein. 

24. An isolated fusion protein consisting 
essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10 N-ter»inal 

25 amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-^like bacteriophage consisting of 
at least the last 10 C-terminal amino acids of the gp36 
protein. 

30 25. An isolated fusion protein consisting 

essentially of a portion of a gp37 protein of a T-even-liJce 
bacterioiAiage consisting of at least the first 20 N-terminal 
amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-like bacteriophage consisting of 

35 at least the last 20 C-terminal amino acids of the gp36 
protein. 
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26. An isolated fusi<m protein consisting 
essentially of a portion of a gp36 protein of a T-even-like 
bacteriophage consisting of at least the first 10-60 
H-terminal aaino acids of the gpse protein fused to a second 

5 portion of a gp34 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 C-terminal amino acids 
of the gp34 protein. 

27. An isolated protein eoaprising at least 20 

10 contiguous asino acids of the gp37, gp36, or gp34 protein of 
a T-even-like bacteriophage, and lacking at least 5 amino 
acids of the amino- or carboxy-terminus of the protein. 

28. An isolated DMA encoding the polypeptide of 

15 claim 1. 



29. An isolated ONA encoding the polypeptide of 

claim 2. 



20 



30. An isolated DNA encoding the polypeptide of 

claim 4. 



31. An isolated OKA encoding the polypeptide of 

claim 5. 

25 

32. An isolated DNA encoding the polypeptide of 

claim 6. 

33. An isolated DNA encoding the polypeptide of 

50 claim 7. 

34. An isolated DNA encoding the polypeptide of 

claim 9. 



35 



35. An isolated DNA encoding the polypeptide of 

claim 10. 
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36. An isola-ted DMA encoding the polypeptide of 

claim 11. 

37. An isolated DMA encoding the polypeptide of 

s elaia 12. 

38. An isolated ONA encoding the polypeptide of 

elaia 13. 

10 39. An isolated DMA encoding the protein of elaia 

23. 

40. An isolated DMA encoding the protein of elaia 

25. 

15 

41. An isolated MIA encoding the protein of elaia 

26. 

42. An isolated DMA encoding the protein of elaia 

20 27. 

43. A aethod f<ar aaking a polygonal nanostructure 
coa^ising contacsting the protein of claim 26 with purified 
gp35 proteins of a T-even-lilce baetericq^ge' 

as 

44. A aethod for making a nanostructure comprising 
contacting a plurality of the proteins of claim 23 with each 
other. 

30 45. A kit ooaprising in one or more containers the 

fusion protein of claim 23. 

46. A kit comprising in one or more containers the 
—fusion protein of elaia 25. 

95 

47. A kit eoBQ>rising in one or aore containers the 
fusion protein of claim 26. 
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48. A kit coaprising in one or more containers the 
fusion protein of claim 26, and an isolated gp35 protein of a 
T-^ven-like bacteriophage. 

8 49* The protein of clain 23 wherein the T-even- 

lilce bacteriophage is T4« 

50. The protein of claln 26 wherein the T-even* 
like bacteriophage is T4* 

10 

51. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the P37 protein 
oligoaer of bacteriophage T4 is unstable at tei^ratures 

19 between about 40^0 and about 60 

52. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the gp35 protein of 

20 bacteriophage T4 is unstable at temperatures between atbout 
40*0 and about 60*0. 

53. An isolated polypeptide consisting essentially 
of a variant of the gp34 protein of bacteriophage T4, wherein 

25 the interaction of said polypeptide with the gp35 protein of 
bacteriophage T4 is unstable at temperatures between about 
40^0 and about 60»C. 



20 



35 
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8471-005 (SHEET 1 OF 19) 

T4 Genes 34*37 seq -> List 

WA sequence 8855 b.p. TAOGACCCCCGC ... CGGCCCTTCTAA linear 

Gene34;bpie-2885; Or£X:bi»389«-409l; Gene35:bp4a27-S014; Cen«36:bpS077-5742,' Gene 3? ;t9p5752-8831 . 



I 10 
1 TAGGAGCCCG 

121 GTTAACCTTG 
181 TATTTAMAG 
241 CCAAAACCAG 
301 TOQhTthCGG 

361 acxxx:agctc 
421 atcgttctcc 

481 OTACAAAGTA 
541 AAGTCACAGC 
601 AGTAGAOAAG 
661 ATCGTACGTA 
721 CATGGCX^ATA 
781 CTTACtACAT 
841 CQTACATCGA 
901 OACQGGGATA 
961 GAAGAMTTTA 
1021 CCAACTKATA 
1081 CAAACAGTTA 
1141 CAA1TCCCAA 
1201 GTTTTTAACX; 
1261 GATQGAAAAT 
1321 AA1QATTCTA 
1381 CSTCGATITAG 
1441 CX3TACTGCTA 
ISOl CAGAACACCA 
1561 AGAACTOCTA 
1621 GCAGQAACCG 
1681 GAATCATTAT 
1741 OQTGAATTAA 
1801 AAAGCmOG 
1861 GAAAGT6AAG 
1921 GAAACCTTAC 
1981 CAAAGT6AAG 
2041 GACCQTAGAG 
2101 GACGCAOGCG 
2161 AGTACTGATC 
2221 GACCATTATA 
2201 CCTACCCAOG 
2341 CTJTTAOGTA 
2401 GAAACTGTGA 
2461 GOGCAGAGTC 
2521 TCTQGTTCAA 
25 01 TATQAGAAAA 
2641 TTGCCACTAA 
2701 TTCATTCGTA 
2761 AATCTGAGIC 
2821 hMMSk MOir 
2881 CCPOCAtCCB 
2941 GCOGGCOgrA 
3001 CACTrrtATT 
3061 ATOCCAATAA 
3121 OCTTCAOTTA 
3181 AACOOTOATT 
3241 OCAOOOQOIG 
3301 TCCQOtCAGA 
3361 OOOOOTTTAA 
3421 ACXXXnCCGC 
3481 ACTTATAACC 
3S41 CTTCCATAGT 
3601 AACACACnC 
' 3661 ACCACTCOCT 
3721 GTATTTGACC 
3761 GC1ACAAT0G 
3841 CCIGACCCAO 



I 20 
GGAGAATGGC 
ATAAAATAAT 
ATTACTTAAT 
ATITTBTAAT 
CAGGACCTTT 
TTTCATCTOG 
GAAATQACAT 
AAGATATTOC 
TTOTAAACTT 
TAGTTTTAAT 
CTATAGTTGT 
GATTTACrrC 
TTATPAATTT 
AGGATOAAAC 
TIGACQGTTT 
GTAAAGCGCX3 
TOGTATtTOG 
TPTCTCITOG 
AAATCAAAGC 
AACGCTCAGA 
ATGAAACTAA 
A71O0GTT0T 
CTAGAGCAAG 
AAAATTCTCC 
CXGAAACtCG 
CATTCTCTTT 
CAGAAACTCG 
ATGATACTAC 
CTGGTATTGT 
ATQGTAOGAA 
ATCAGTATAA 
TAATTGCTGG 
ATAAAAAGAC 
TTAATACAGG 
CAACTGAAAG 
TOCACGATAC 
GTACTTCTGT 
CACTIAATAT 
TOGAAGCTGC 
CTAAATCTAC 
CTQGAAOQTC 
AACCTACTTC 
TTACATTCGT 
ATAGCTATGC 
AAGCAAAAGC 
GQGATATICC 
CCCCTCTPOT 
mCCATCCG 



GTGATAC6AC 
CIGAAO&IAA 
ACATTAATOC 
CAGC3ClkA100 
AOOGATTCTT 
ATCAGACTGG 
TTACAATTOG 
CTCTIAACTC 
CAACATCT6A 
AGTICCCGOG 
TMSAftOQTOG 
ATTCQCmA 
QOACAOGTAC 
GAOGTAACCC 
GGAATCTTAC 
TGAATAAAAC 



I 30 
CGAGATTAAA 
CAACGTAOCT 
TCAACAAAAC 
CATTTATGAT 
TAATMSCX3GA 
TTCATATCAA 
CACOITTACT 
AOGAAAACCT 

tagagqigaa 
ttttagtaat 
aacaccagco 

tgctocacca 

CGTOGATTTA 
GACnCAGTA 
CTTQATCTTT 
TTTACCTATC 
TQCGAATAAC 
TGATACJtJTT 
TOCTGATGAA 
ATATCCACCT 
TTATGTrcCA 
ACAOCAAAAC 
ATTAOGOGTA 
ACAAAMGAA 
CAGAOGTATT 
TGCTGATGAT 
TAGAGGTGTC 
AATCATCACT 
AAGCTTTGTA 
TOTTTATAAT 
AGCTACTCCA 
ACAAAGTCAG 
ATCAACTCAT 
AACTGATTAT 
TTTAAGTGOT 
TOGTAtCTCT 
TC I TOCT C T A 
TCnOAACCA 
TGOOGGAACA 
TGAAGCCCAA 
AGCAAATACr 
OGCAGCTACT 
TGGTAATGAT 
GGTATCACCA 
TGCTGATACA 
ACAGAOOGTT 
ATCATCTACT 
TMTACftOGA 
TQCACAdCA 
OCCTTOCACA 
TAAAGAOOGT 

TTccocTrrc 

TGMTKATC 
TATIOSTAAT 

TOrrrmAT 
tgaaooctta 
cagaattqgt 
tactgtaoga 

TrATriTAAA 
GGAAGMGTT 
CCAAOATTOG 
ATOGCACAAA 
TOCTCAACCA 
TATTOGPGAT 
OGTTAAATTT 



t 40 
AGAGAATTCA 
TTAGCTCATC 
ACAGTTCAAC 
AACGOCTTTT 
CGCTGGAGAG 
TTAAAATCTG 
TTACCATCTT 
GCAGTTAACC 
CAGGTACGTT 
C C TCTCTO Q C 
AATACTTATC 
ATTAATOTCA 
CATAAACTAA 
CAAGAAGTTG 
GATSATAATC 
ATAACGACTX 
OGAACAACTC 
AAAATTTCCA 
GATAAAATTG 
GAAGCTGAAT 
GTTTTGGAGC 
GTTCCAACTC 
ATTGCTTTAO 
TTAGCAATTA 
GCAAGAATAG 
ATTATCATCA 
GCAGAAATTG 
CCTAAAAAGC 
TCTACTQCAG 
AAAAACACTC 
ACACAGCAAG 
CAAOGATOGG 
GGAAGAATTG 
ACTCOTOCAG 
ATAGCTGAAA 
ACACCATTAA 
TCIGCAITAG 
AAlGlkGACAC 
TTAGAIAATG 
GACCGTCTTA 
GCTCTATCIC 
ACTGCAATAA 
ACAGTCGGTT 
TATCAATTAA 
AATTTATTGG 
AATQGTICAC 
ACTC9QT6AAT 
GOOCCGACTA 

atgactattc 
gtgtttgaag 
aatatagcgt 
atgaatgtca 

AOCAAGTCIC 
GATOOCTCTA 
GGATTACGCC 
ATCATPGCCA 
TCTCAGOGTA 
TTCTOGTCAA 
ATOGTT8MA 
AAATCIOCTS 
ATIACTTATC 
ACCAAAAACT 
TCTGATATOC 
TTCTT5CGAA 
GAA1Q0GTTG 



I 50 
GAGCAGAAGA 
GTACCGTAOG 
AGTATCATCC 
OGGCTGCTAT 
CATTACGTAC 
GTGAAOCAAT 
CTCCAATTGA 
AAGTTTTAAT 
CAGTACtAAT 
AAATGTATOT 
AAGCOCAATC 
AACTTCCAAG 
ATCCOCTTTA 
GAACTCATTC 
AGAAATTAtG 
ATTCAAACAT 
AAACAATTGA 
TGAATTACAT 
CTTCTTCAGT 
GOGTTACAGT 
TTGCTOCAT 
TAGAAAGAOT 
CTACACAAGC 
CTCCAGAAAC 
CAACTACTGC 
CTCCTAAAAA 
CTACGCAGCA 
TTCAAGCTCG 
GTGCTACTCC 
ATAATmGT 
CTGCAGTAAT 
CAAATOCTCT 
GTTTAATXGA 
TCACTCCTAA 
TTGCTACACA 
AAATTAAAAC 
TTGAATCAGC 
AACCTGGTAC 
TTTTAATAAC 
TTAAAGTTOC 
CAAAAAATtT 
GAGCTTTTOT 
CTACCCAAQA 
ACCGTGTATT 
ATOGTCTAGA 
TAAGCTTAAC 

TiGOToorrc 

GTATCCTTTT 
CTGTATOGGG 
TXOCCGATGA 
TTAACATTAA 
ATGGCACTGC 
CAAATXnr 
ATAOCTATTT 
CATTATTAAT 
AAGGTCTTAC 
CTAAAACATC 
TCGATATTAA 
AAACTAATBA 
GTACACTGAC 
CAAOQAOGCC 
CTTOGTCAAG 
GTtXnTTACC 
TIOGTAATCT 
AATAAOAOQT 



I 60 
TGGTCTGCAC 
AACTGAOGGT 
AACTOGTOGA 
AAATGATATT 
CGATGCTAAC 
TTCGGTTAAC 
TQGTQATACT 
TCTAGCTCCA 
GACTCATCCA 
TGCTOATTAT 
CAACOATTTT 
ATTTCCTftAT 
TCATACAATT 
CAiraAAGGC 
GAGAC7GTTT 
TCGTCCAAAT 
GCTTAAGCTT 
GACAAAACGA 
TCAATIGCTG 
TCAACAATTA 
AGAAGATTCT 
AGATTCTTTA 
TCAACCTAAT 
GTTAOCTAAT 
TCAAOTCAAT 
GCTGAATGAA 
AGAAACTAAT 
TCAAGCTTCT 
AOCTTCTAGC 
TCTTTCACCT 
TTTAGCAGTT 
TGTAACGCCA 
AATTCCTACG 
AACTTTAAAT 
AGTTtjAATPC 
CAGATTTAAT 
AACTCTCTCG 
ACiTCGTCTA 
DCCTAAAAAC 
AACTCAGTCr 
AAAAIGGATT 
TAAAACTTCA 
TTTACAACTO 
AGCAAATTAT 
TTCATCTCAC 
OCAACAAACG 
ATTOOCCOCT 
CGAAAAAGCT 
TAACCAATTT 
CACAT CTCAT 
TGC7ACTCTA 
AACATTOOGT 
tACnOCAATA 
TI'WCTCACT 
TAATAATCAA 
TATAAATTCA 
TOAnTATAT 
TGATTCACQC 
AOTQACTCXiG 
TCABTTTCGT 
A6M0CGCGT 
TTTTCTTCAG 
ATCT SATAAT 
TCOCATTGTT 
ATTATQOAAA 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

2800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 
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3360 

3420 
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3600 

3660 

3720 

3780 

3840 

3900 



FIG. 6 



W096M1M7 8471-005 (SHEET to OF 19) 



PCTAJS95/13023 



, 1'4 uenes J4*J/ f eq -> List 



3901 AATTTATGGC 
^961 CAGTAAGATA 
4021 ATGTTAAA1T 
4081 AGAOTTTTTO 
4141 TCAAATCATA 
4201 AATTGTTGCT 
4261 AiiGAACCCCC 
4321 ATATGCTCCT 
4381 TAATAGAAAA 
4441 AGATOTAGGA 
4 SOI OOCACTTCSGT 
4561 ATTATCTGAT 
4621 TGATOCTCAT 
4661 TfU^OOGATAT 
4741 AAAOCAAGAT 
4001 AACtCCACAA 
4861 TOOOOGCATT 
4921 CGAATCOOCT 
4981 TAAACTOTTT 
5041 TICPITATAA 
SlOl ACAACTCGAC 
5161 QATCTACTCT 
5221 AAOGATTTCG 
5281 OOCATTCAAO 
5341 GGTSATCXilG 
5401 TTTAAATCGT 
54fl ATSAACACAA 
5521 ATAOCAGCGA 
5581 OCAAATGCAA 
5641 CCAAACTTTT 
5701 AtCGTAATTA 
5761 TAAAACAAAT 
5821 TAGCCOAAOG 
5861 CAOGAAATAT 
594 1 TTAACOGACT 
6001 GACCCAttOQ 
6061 TTTATGCAAG 
6121 OCACTGAACG 
6181 TTAOOGTFM3 
6241 AWSAOGCOA 
6301 TTCCOCTTGA 
6361 TOCTTAATTA 
6421 TTQOOGCTAA 

6481 ctqatgaagt 
6541 acxsatoocm; 

6601 UltCTACTCA 
6661 CTGTAACTQG 
6721 ClOl'UXH C 
6781 CTGAOGACCA 
6841 AAACACMGC 
6901 GCOGTAAAAT 
6961 GTATOOAAAT 
7021 ACOCTCACGG 
7081 CTAAATCTAA 
7141 OGACTATCCA 
7201 AAOCATGOQG 
72(1 
7321 

7381 ACOGAAATTT 

7441 m T mrrn;'! ' 

7501 ACGCMCTGAG 
7561 TrtKthTtKt 
7621 GACCTCTCAG 
7681 tMATCAAAA 
7741 CAATGCAATT 
7801 COOGTOCAOG 
7861 ATATTGATAG 
7921 OCAAtOOCIO 
7981 ATOOOOGOOG 
8041 TTAAAAAC06 
8101 OAACTOGTAA 
8161 CACmAAAC 
8221 ATlCAOnCC 
8281 CMAGlTftGC 



CGAGATTTGG 
TAAAATAAGT 
TCAOOATAAT 
ACCCTTCCAC 
CTACATCAGC 
ATATTAACTA 
GGAACGTCTfJ 
TTTTATACrT 
ASCACAGATG 
GCTACCGCGT 
OGAACTAATA 
TATAATTTAA 
TTACTTGCTC 
ATTTCTTCAA 
TATOTAOTTO 
GCTGOCCAAG 
TCXIAAACCTO 
TCACCTCCOG 
GGGCAAGAAT 
ATACTATTCA 
GCmtsTCAT 
ATAAATCATT 
TTTCTAAAGC 
TCCCATATOC 
CTACrmGA 
CATTTOGTTC 
AAOGTGTTOT 
ATGACCTTAC 
ACTCTAGOGT 
TCTOGCAGAA 
AGGATTCTGT 
ACAATTTAAA 
TGAATTGGCT 
CATCGATCTA 
TTTBAGATTA 
TTCTACTGAT 
AGCAACAAAC 
T06CGTTATA 
ACAAGGAACA 
ATTICAOGCT 
TACCCTTATT 

TorrrATCCT 

OICOOGTGGT 
ri C TTGG T GG 
AATGATTATC 
TTATGGCAAC 
CTTGTCATAC 
TATCACTCCT 
AOGCOCAACT 
TGATMTAAC 
OAACCACKAT 
TAAOCOOOOT 
AACTATTTCT 
TAATACTGCG 
ATQGAACGGT 
TAACTCATTT 
ICAAOGATAT 

TAGAcntrrr 

CCAAOGTGGT 
AATTTGGAAC 
TCCAAOCAAT 
AATAG CATTA 
TAAtOCTTTA 
OGGGCAGTCG 
TOCAITTQCT 
AACTSATGCT 
CTATTCATTA 
AGATAAOOGT 
TCATTTTATT 
TATCACTOOT 
TO ATAT CATQ 
TOCTOGATTT 
TOTTGCATAT 



ACAAOGATAT 
ATAOCOOCTT 
CCTGTAOGAA 
CG GAOCAT TA 
TGCTTTTCTT 
GTCXJAAAGGT 
CCTTTCCATC 
CTTCTAAAAG 
ATTATCAAAC 
TTCCAAGAAG 
AC36AAATTGC 
TTCCTCGAGA 
TTGGAACTAC 
CACAAGCTGA 
TTCCAGAAAA 
GTOQCATSAG 
CTGAATTTCG 
ATATAAIGGT 
TTAGAGAAGT 
AATAAAGOOG 
TTGGCATCAA 
TAAAATAThT 
TAATQGTOGT 
TCCAAACATC 
TAAAGCAAAT 
AACAGGCCGA 
GTCOGCAGCT 
TAGAAAGGAC 
GCTAOOGTCT 
TCCTOCATCT 
TCAAGATTTC 
AGAACCAAAA 
ATAAACTTAA 
GOTTTTCCTA 
AATQGCGATT 
GGCGTCACTG 
GATACTTCAA 
TATGCTCGCC 
GGAAGCACTG 
AACCOTATTT 
CATGATGCCA 
GGAACCOGTG 
ACAATTTATC 
TCTGGTGATA 
OGTAATAGCC 
GTCGGTGTAA 
AAAAAAACTG 

GACAcnrrcc 

IGGATAATGC 
AATOCIOGMS 
TTCCGTGCTA 
ATTTTCAAAT 
TCCATTCAAC 
OGTCTTAAAT 
OGTACTCGCG 
AATQCCACTB 
TATTTTTATG 
CAATTTGCTG 
CGOtCAAGOG 
TCTTCTATTA 
GCT8AATATG 
CAAAAtGAAG 
AAOGATOOCA 
ACTAOQArrAA 
GCATACATTG 
TOOCAOAATA 
ACTOCATATC 
COOACTTTAA 
TCTACAOCTC 
TCACCTCGGG 
GQTTCTO G T A 
TOOAOTTACC 
GCTTTQATGG 
CCTAGCOGTG 



GTCCAAACGC CATTrTATCG GAAAOTftATT 3960 
CTIGCCCCCT TTCTACAOCA OGACCATCAT 4020 
OTCAAACATr TAOGCGCACO CCTTCAT TTA 4080 
GTTOAtAGTA AGTCATATCC TITTrCGACT 4140 
AGTTTTCATC AATTCTTTOA CGAATAATCG 4200 
TAATTTTCCT CCIGAAGTAC TATCTIQOTT 4260 
TCATTCTATA TTGTCAAGAT TTGACCTATC 4320 
ACCTATCCCA TTHOACCATO TTAAACTGAG 43B0 
TATTTTACAT GTPGTATTTC ACAGTrTAGA 4440 
AACGTATCAA AQTGTTCAGC AATTCATGTC 4500 
GAGATTCCCA ACTICAGCIG CTATAAGTAA 4560 
TCTTCTTTAT CTTAAAOCTC AGTTATATGC 4620 
AAATATATCT ATCCGTnTT ATAATOCATC 4480 
ATTTACTOGG CAAGCTGOOT CATOOGAATT 4740 
COCMnVtfSGA TTTOCGATAT ACGCACAGAG 4800 
AAATTTAAGC TTTPCTGAAO TATCAAOAAA 4860 
CGTCAATOGT ATrCGTOtTA ATTATATCTG 492C 
ACTTCCTACO CAAGCATCX5T CTAAAACTOG 4980 
TTAAATTCAG OGACOCTTOG OOnCCCTTT 5040 
CATACAATGG CTGATTTAAA AGTAUGTTCA SIDO 
GGAAATTXTC CATTOAATCC AOCCOSTGAC 5160 
TCAGAAV^TA ACMACCACA AOCTQClGAT 5220 
ACTTATOCAT CAAAOGTAAC ATTTAACGCT 5280 
ATCACCCXTAT OCGGGATTTA TOGOOGTAAC 5340 
ATCCATATTG TTTCATOGTA TOGOyrAGGA 5400 
ACTBTTGTAA TTAATACAOC CAATOGIXSAT 5460 
GGTCAAGTAA OAACTCWTOC GG LtO CT C CT S520 
TATCTTGATO OAGCAATAAA TMrPOTTACT 5580 
GGIGACACCA TOACAOGTAA TTTAACAOCG 5640 
CAACCCICAC ACGTTCCACG ATTTOACCAA 5700 
GGCTATTATT AAGAOGACTT AlOOCTACTT 5760 
TCGCAOGAAC ACGTCCTGCT GCTTCAGTAT 5820 
AAGATAGAAC AATTITTACT AAAOATGATT 5880 
AAGCCGOGCA ACTTQfcTOOC AAOOmCTA 5940 
AIGTACAAAC AOOTOGAAl© ACTOTOAACO 6000 
CAAAAATTrr CAOATCTACA CMXXSTTCAT 6060 
ATCCCCATTT ATOGTTTGAA AATCCCGATG 6120 
CTCAAACTAC AACTGACGGT GAAATACGCC 6180 
CCAACAGTGA ATTCTATTTC COCTCTATAA 6240 
TAGCATCAGA TTOOTTACTO ACAAA ACCCA 6300 
AAGCATTPGG ACAATATCAT TCTC^CI CTT 6360 
AAACAAATGG TGTAAACTAT CITOGTAAAG 6420 
A1X3AAATTCT TACPGCACAA ACAGOCCTCG 6480 
CACX^GTATT TAAACTATAC GGTATTCGTG 6540 
TTGCATTAOG TACATTCACT ACAAATTTCC 6600 
TOOGCGATAA GTATCrTfftT CTCOGCOACA 6660 
GTCTATTTGA TCTAGTIGGC OCrOOATATT 6720 
GTACrrACKG TAAAlOOIAtA T f W I t& lT 6780 
CIGGIACAAA ' fUC f UC 'I C T C TTB1C1CTTC 6B40 
ACOGACAAAC CCATATCOOC TACAATGC1G «900 
CAOGIXAGAT GAATATCMAT ACCCAACAAG 6960 
TCGTAACTOG CTCTAATAAT GTA CAATTT T 7020 
CTATTAAATT AGATAACGAG ATATtTTTAA 7080 
TXGOAGCTCC TAGCCAAGTT OATOQCACAA 7140 
AAOGACAGAA TAAAAACTOT CTSATTAT TA 7200 
GTGATAGATC TCOOGAAACC CTTTTCCAAO 7260 
CTCA1GOTAA AOCTCXAACC OOCGACGAAA 7320 
GOGAlGmA TOCTAAAOGT ATTATTGCCA 7380 
CTTTAOOCOG CAATCTIACT AtGlCIAACX; 7440 
dOGACAACrr TPAAAATIOGC OGAACMXAA 7500 
GTOCI AffTTT CXitf m; 'l l \A CAA WIAACT 7S60 
GAOAAMnOG AOACATTCAC AOCTCTI TO 7620 
TOOTTQCGTr AOOAAGACAT TCmTATAG 7680 
ACMGTAACTC TOCCATTAAT OOaACTTTA 7740 
ATCCAGAATQ TACTGATCCT GTTCOCCCGG 7800 
AIGAAGAGGT CCCTGCCCCG TTCTATATGA 7860 

TTxxnArrrf gaaacaacgt tatcttcaao 7920 

TTAATAATQG IMTITCCGA GTrCATIACC 7980 
CMAGMCTOC TOATTnOQA TOGGMTITA 8040 
ATTnUVmCC AOOCAAAGTC WATTTBATA 8100 
ATITIGCTAA CTTAAACAGT ACAATIGAAT 8160 
CAATTGGTOC TOCXSATTCCT TOOCOOAOTO 8220 
MiXnCkSkC crrVBATAAG TCCGCAIATC 8280 
riATltXACA TATCOOCOOG CAAACTATCA 8340 
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0341 
8401 
S461 
8S21 
•SSI 

sen 

S701 
8761 
8831 



MOOTMAOC 
aOCATAQlOC 
AT0G1AOQXK 
OCKCMATOO 

TKOQTKTTOQ 
TAAATMTAC 

I 10 



ATOOOCTXai 
OOGAACIMC 
VMOOkCMOC 
ATCAWjTOOC 

iOCTCATSICC 
AOOmUlT^CA 

I 20 



GCIOTITTQA 
AOTKCTQRCT 

jumooooiG 

AIKTCATRCA 
TCTTTPOOOA 
CACAOOOTAO 
OIVAXACAOOO 
CTTQOGOOCT 
I 30 



GOOCTOKOOC 
TAOGRRKCIM 

AOQCKIOm 
GOOOOOOTOa 
CTlbOCAOTOC 
CAATTOOATC 
TTMUUUkCAT 
TC1AA 

I 40 



AGATOOTdlT 
AACgWCfc 
CiCTOOTfcOT 
tOOTACTOGT 
OAOTKACWrr 
TQOOGACXAT 
AC AfgOTC AT 
TOCmTMC 

I 50 



AAC iOCTCK TA 
AOCRTMT 
OOTTCTOCTA 
GIAOQTOOTA 
AAT9CMX»0 
tOCCACTCTO 
ACI ATOCW 
1ATAT00TTC 



8400 
8480 
8520 
8580 
8840 
8700 
8760 
8820 
88&S 
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T4 Gcaes 34-37 leq -> Genes 

nU sequttice 88S5 b.p. TAOCnOOCOQGG ... COQCCCTKTKk llll«»r 

G«n*34ttopl6-38B5j OrfXxbp3B94-409l; G«n«3Stl3p4127-S014: Cene36ifcp5077-57«i 0«»« 37ilJpS75l-8eJl. 

I tMmOCCCOOGWUi ATC GCC GAG ATT AAA AGA CAA TTC AOA CCA GAA OAT COT CTO OAC CCA €3 
1 M*BIK»BFRAEDOl.DAl* 

M GOT OCn* CA1» JUUi ATA ATC AAC 0?I1A OCT 1^ OCT GAT CXIT AOC OTO JP 

17COD1CIIHVAI.ADRTVOTDOV 36 

134 AAC OTT COT tl^C TTA ATT CAA G»A AAC ACA GTT CAA CAO TAT OAT CCA ACT COT 0» 183 

37HVDYLIOEHtVOQyOPTROY' 56 

1B4 TTA AAA CAT TTT OTA ATC ATT TAT GAT AAC CGC TTT TOO OCT OCT ATA AAT OAT ATT CCA 243 

57LKDPVIiyDNRFWAAlH»I' ^» 

244 AAA CCA OCA OGA GOT TTT AAT AOC OGA CGC TOO AGA GCA TTA COT ACC OAT OCT AAC TOO 303 

77KPAOAFHSORWRALRTDAHW 96 

304 ATT AOS CTT TCA TCT OCT TCA TAT CAA TIA AAA TCT OOTOAA OCA ATT TOO OTT AAC ACC 363 

97ITVSSGSYQI.KSCEXX SVKT 1X6 

364 OCA OCT OGA AAT GAC ATC AC30 TTT ACT TTA CCA TCT TCT CCA ATT OAT OCT OAT ACT MC 4J3 

117AAOHDITFTLPS5PIDODT1 136 

424 OTT CTC CAA CAT ATT OOA GOA AAA CCT OGA CTT AAC CAA OTT TTA ATT OTA OCT CCA OIA 483 

137VLQDIOOKPCVHQVI.1VAPV 156 

484 CAA ACT ATT OTAAACTTPAOAOGTOAACAOOrACOTTCAOIACEAATOACTCATCCAXAO 543 

ISTOSIVKFRCEOVRSVLMTRPK 176 

S44 TCA CAO CTA OTT TIA ATT TTT AGT AAT COT CTO TOO CM ATB TAT OTT OCT OAT J03 

177S0I,Vl,IFSMRLW0MTVAOYS 196 

604 AGA OAA OCT ATA CtT OTA ACA CCA OOO AAT ACT TAT CAA OCO CAA TCC AAC OAT TO 663 

197REAIVVTPANTYQAQSHDP1 *16 

664 CTA COT AGA TIT ACT TCT GCT GCA CCA ATT AAT GTC AAA CTT CCA AOA TTT OCT AAT CAT 723 

217VRRFTSAAPINVKI.PRFAHH 236 

724 OOC CAT ATT ATT AAT ITC GTC CAT TTA CAT AAA CTA AAT CCC CTT TAT CAT ACA ATT CTT 783 

237 CDIINFVDLDKLHPLYMTIV 256 

784 ACT ACA TAC CAT CAA AOO ACT TCA OTA CAA CAA OTT OQA ACT CAT TCC ATT OAA OOC COT 843 

257 TTYDETTSVOEVGTHSIBCR 276 

844 ACA TCG ATT CAC OCT TIC TIC AT6 TPT CAT GAT AAT GAG AAA TTA TGG AGA CTO TTT GAC 903 

2>?TSlI>CrLllFDDMEKI.MRLFD 296 

9<M OOC GAT AOT AAA COB OGT TTA a» ATC ATA AOO ACT AAT TCA AAC ATT COT CCA AAT OAA 963 



397CDSKARLRX 



ITTHSMXRPltC 316 



964 0AAOPrA10 0»TOOOTGOCAATAACOGAACAACTCAAACAATTCAOCTrAAOCTTCCA 1023 

3l7EVMVFOAIIHOTTQTlEI.EI»P 33* 

1024 ACT AAT ATT TCrr OCT GOT OAT ACT OTT AAA ATT ICC ATO AAT TAC OTO AOA A^ 10B3 

337TlttSVC0TVKXSHIIYMRKC0 356 

1084 AOiCTTAAAATCAAAGCTOCTGATOAA6ATAAAATrOC»lCrTCAGrrCAAnOCIO«A 1143 

357 T V K I K A A 0 E D K I A 5 » V O I' !• 0 W 

1144 TICCOiAAACOCTCAOAATATCCACCTOAACCTOAATaOOTTACAGTTCAAOAAJTAOrT U03 

377 F PKRSBY PPEAEHYTV0BLV 396 

1204 TIT AAC OAT OAA ACT AAT TAT OTT CCA OTT T10 GAG CTT OCT TAC ATA OAA OAT OAT 1263 

397 FHDETHYVPVLBI.RYIBD8I> 41* 

1264 OOA AAA W TOO CIT CTA CAC CAA AAC CTT CCA ACT OTA OAA AOA OTA C^T TCT TO 1323 

417 GKYWVVQQKVPTVERVrSLH 436 

OATTCTACTAOAOCAAOATTAGOCOlAATTCCTTTAOCTACACAAOCrCAAOCTAA^ 



1324 

437 D0TRARI»O 



V l" A " A ' T ' 0 A 0 A K V 456 

1443 



1384 GATTTACAAAATTCTCCACAAAAAOAAWAOCAATTACTCCAOAAAOTmOCTA^^ 

457 0I.EHSPQEBtAI»P«*tAIIR 476 

1444 ACTQCTACAQAAACTO0CAOAOGTATT0CAMaAATA0CAWrrJ[CT«:TCAAO^ 1503 

477 TATBTRROJARIATTAQVH0 486 
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1S04 MIC ACT ACSk TVC ICT TW OCT <MT <»T ATT AtC AtC ACT OCT *M AKO CXO AAT OAA AOA 1SS3 
4»7irTTP«FAOPlIlTPKKt»«» 

1564 ACT OCT ACA OAA ACT COT AGA OCT OTC GCA GAA ATT OCT AGO CAG CAA OAA ACT AAT OCA 1623 

517TATKTRROVAEXAT00BTKA 53S 

1624 GGA ACC GAT OAT ACT ACA ATC ATC ACT CCT AAA AAO CTT CAA OCT OCT CA* ^ ?^ Jf 

S37 0TDOTT 1 IT PKKtOARQO»B 5S6 

168« TCA TTA TCT OCT AOT GTA ACC TTT OTA TCP ACT OCA OCT OCT ACT CCA OCT TCT AOC COT 1743 

557 SLSOlVTFVSTAOATPASffR 576 

1744 CM TTA AAT GOT ACie AAT OTT TAT AAT AAA AAC ACT CAT AAT TTA CTT err TCA CCT AAA 1103 

577 ELHCTMVYMK1ITDNLVV3PK S96 

1B04 OCT no OAT CAO TAT AAA OCT ACT CCA ACA CAO CAA OCT OCA OTA ATT TTA OCA OTT GAA lt63 

597 ALD0TKATPTQQCAVXLAVB 616 

1864 ACT OAA OTA AW OCT OOA CAA AOT CAG CAA OOA TOO OCA AAT OCT OTP OTA AOO CCA OAA 1923 

6X7SEVJAOQSOOOWAHAVVTPE 636 

1924 AOOTTACATAAAAAaACATCAACT OAT OOA AGA ATT GOT T9» ATT OAA ATT OCT AOS CAA 19B3 

637 TLKKICTSTDGRIOI.IEXATQ 656 

1984 AOT GAA GIT AAT AGA OOA ACT GAT TAT ACT GOT OCA OTC ACT OCT AAA ACT TXA AAT OAC 2043 

6S7SEVHTOTOYTRAVTPKTI.»» 676 

2044 COT AGA GCA ACT OAA AOT TTA ACT GOT ATA OCT OAA ATT OCT ACA C*A OTT OAA TTC OAC 2103 

677 RR .ATESLSCIAB1AT0VBFD 696 

2104 GCA OGC OTC GAC OAT ACT COT ATC TCT ACA CCA TTA AAA ATT AAA ACC AGA TTT AAT AOT 2163 

697 AGVDDTRISTPLK1KTRFMS 716 

2164 ACT GAT COT ACT TCT OTT OTP OCT CTA TCT OOA TPA OTT OAA TCA OOA ACT CTC TOO OAC 2223 

717TDRTSVVA1.S0LVESCTLWD 736 

2224 CAT TAT ACA CTT AAT ATT CTT OAA OCA AAT OAO ACA CAA OCT GOT ACA CTT COT OTA OCT 2283 

737 HYTLWIIrEAHETQRGTLRVA 756 

2284 AGO CAO OTC GAA OCT OCT GCG OOA ACA TTA GAT AAT OTT TTA ATA ACT OCT AAA AAO CTT 2343 

757 TQVEAAAOTtDWVl#ITPKKL 776 

2344 TIA OCT ACT AAA TCT ACT GAA OCOCAACAOOCTCTTATTAAAGTTGCA ACT OiG TCT OAA 2403 

777 LOTltSTBAQEOVIKVAT0BE 796 

2404 ACT GTB ACT OOA AOB TCA OCA AAT ACT OCT OTA TCT CCA AAA AAT TTA AAA TOO ATT QCO 2463. 

797 TVTQTSAMTAVSPKNLRWXA 816 

2464 CAC AOT GAA OCT ACT TOO «ICA OCT ACT ACT OCA ATA AGA GOT TTT OTT AAA ACT TCA TCT 2523 

817 0SBPTWAATTAIRCPVKTSS 836 

2524 COT TCA ATT ACA TTC OTT OCT AAT GAT ACA CTC OCT TCT ACC CAA CAT TTA OAA CIO TAT 2583 

837 6SlTrVCIIDTVCST0I>l*B«»V •5* 

2584 GAC AAA AAT AOC TAT GOO OTA TCA CCA TAT GfcA TIA AAC CGT OTA TTA OCA AAT TAT no 2643 

a57EKIISyAVSPyEJUHRVLAHyL 876 

2644 CCA CTA AAA OCA AAA CCT OCT CAT ACA AAT TTA TTO OAT GOT CTA GAT TCA TCT CM TTC 2703 

877 PI,KAKAADTNI»I*DC1.DSS0F 896 

2704 ATT CCT AGO GAT ATT OCA CAG ACO GTT AAT OCT TCA CTA ACC TTA ACC CAA CfcA ACQ AAT 2763 

897 IllftDXAOTVIICSL TZ»TOQTN 916 

2764 cm ACT OOC OCT CTT OTA TCA TCT ACT aST OCT OAA TTT GOT GOT TCA TPO OOC OCT AAT 2823 

917&SAPt»VS55TOErGGSbAA» 036 

2824 AQAACATTTACCATCOOTAATACAOQAGOCOOOACPAOTATCOITTICOAAAAAOOTOCT 2883 

937 RTPTlAH»OAPTSrVFEEOr 956 

2884 QOi TOC 000 OCA AAT OCT <3CA CAO TCA ATB AOT ATT COT GTA 100 COT ARC CAA TIT OOC 2943 

957 A S G A H P A Q S M S 1 R V . H. 0. Q F, O 976 

2944 OGC CCT AOT GAT AOB ACC COT TCO ACA GTG WT CAA OTT OGC GAT OAC ACA TCT CAT CAC 3003 

977 GGSOTTRSTVFEVGDDTSHH 996 

3004 TTT TAT TCT OM OCT AAT AAA OAC OCT AAT ATA GOB TIT AAC ATT AAT GOT ACT OTA ATO 3063 

997 FY SQRHKDGHlAFHJMGTVM 1016 

3064 CCA ATA AAC ATP AAT OCT TOC OCT TPS ATO AAT OW AAT OOC ACT OCA ACA TO COT COT 3123 

1017 P I NIMASOLMHVHCTAT FGR 1036 

3124 TCA OTT ACA OOC AAT OOT OAA TTC ATC AGC AAO TCT OCA AAT OCT TTT AOA OCA ATA AAC 3183 

1037 8VTANOBFI6XSANAFRAIN 1056 
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3184 OCT OAT TJkC 0» TIC TTT ATT <OT OIlT OCC TCT AXT JkCC W 

10S7 ODYOFri»MDASKTirrLl.TA 1076 

3244 CCC OCT OAT CMS ACT GOT GOT TTT JttT OOA m COC CCA TIA TTA ATT A^^ "« 
1077 ACD0TOOPMOtRPLI.XilWQ» 



330« OCT CAQ ATT ACA ATT OOT GAA OtXr TIA ATC ATT <XX: AAA OOT OTT ACT ATO MT TO 33J3 

1097 GQITIOEOLIIAKCVTIHSO 1116 

33S4 OCT WA ACT OTT AAC TOT AOA ATT CXrr TCT CAG OCrr ACT AAA ACA TCT <»T TTA t^^ 3423 

1117 OLTVHSRXR«0OTKTSDLYT U36 

3424 OST OCO CCA ACA TCT OAT ACT OTA OOA TTC V» TCA ATC OAT ATT AAT OAT TO 3483 

1137 R A P TSDTVG flfS IDIWDSAT 1156 

3484 TAT AAC CAG TIC COG OOT TAT TIT AAA ATO err OAA AAA ACT AAT OAA OTO ACT 000 CTT 3543 

1157 YMQFPOTFltMVBKTKEVTOL 1176 

3544 CCATACTTACWOJTOOCCAAOAAOrrAAATCTCCTOCTACACTSACTCMSTO 3603 

1177 PYLBROESVKSPOTLTOPON 1196 

3604 ACACTTQATtCOCITTACCAAaAT»OOATTACTTATOCAA0»AO80CAaAAy» 

1197 TLDSLYQDWItYPTTPEAAT 1316 

3664 ACTCOCTQOACAaWACATSOayaAAAACCAAAAACTCTTaQTOAGTTTTOrPaOOIA 3723 

1217 TRWTRTWOKTKHSHfiSFVOV 1234 

3724 TIT GAC 0(» OCT AAC CCT CCT CAA CCA TCT OAT ATC OCT OCT TTA CCA TCT OAT AAT O^ 3783 

1237 FDCCNPPOPSDIOALPSDHA 1256 

3784 ACA ATO 000 AAT CTT ACT Arr OAT TTC TTO OC» ATT <Xyr AAT OTT CGC ATT 3843 

1257 TMONLT1RDPI.R ICWVR1VP 1276 

3844 GACCCACTOAATAAAACOOTTAAATTTQAATDOGITOAATAA OAOOTATT ATO OAA AAA TTT 3905 
1277 DPVNKTVKrKWVE* HEKF4 

3906 ATO OCC OAO ATT TCW ACA AOO ATA TOT OCA AAC CCC ATT TI* TCX5 OA* AGT AAT TO 3965 

5MAEIWTRICPHAILSESNSV 24 

39C6 AGA TAT AAA ATA AOT ATA GCG OCT TCT TOC CCC CTT TCT ACA OCA OOA CCA TO TAT O^ 4025 

25RYKI StAOSCPLSTACPSYV 44 

4026 AAATITCIW OAT AAT CCTOTAGOAAOTCAAACATTPAOOCOCAOOOCTTOTO 4685 

45KF0DtfPV6SOTFRRRPSFKS 64 

4086 TIT TO CIXTTCCACaWIWCAT«OTTOTAOf»A0TOT ATO CIT TTT OSA CTT »A ATO ATA CtA 4153 
r * KIpFRI»0HXL9 

4354 CAT CAC^CTC CTTTTOTIAGTTTICATCAATTCTTrSAOCAATAATTOATrGrTOCrAaA 4213 

lOHOtLI^LVFMltSLTllllRlVAl 29 

4214 TIAACTACTOGAAAOOTrAATTrTCCTCCrOAAOlACTATCTTEWTIAAaAAtt 4273 

30LTSOKVMFPPEVVSWI.R TAG 49 

4274 AOOTCTOCXTTTCCATCT OAT TCT ATA TTOTOAGATITCACOTATO J333 

50TSAFPSDSILSRFDVSYAAF 69 

4334 TATACTTCTlCTAAAAOAOCTATCOCATTAOAOCATOTrAAACTaAflTAATAOAAAAW^ 4393 

70TT8SltllAXAl.EMV RI.SMRKS 89 

4294 RCA GAT OiT TAT OA ACT ATP TIA OAT <W cm TTTtSAC ACT JTA ?1? 

90TDI>rOTII»D. VVPDSLEI^VOA 109 

4454 ACCGOOTTTCCAAOAAOAACJOTATCAAAOTOrPGWCAATTCATOTOGCACITO^ 4513 

110TOFPRRTYBSVEQFMSAV06 U9 

4514 ACT AAT AAC OAA ATT GOO ACA TTO CCA ACT TO GOT OCT ATA JU» AAA TIA TCT ttT W 4573 

130TM1IEIARI. 'PT8AAISKI^«0* "» 

4S74 AAT TIA ATT CCT OQA CAT CTT CTT TAT CTP AAA OCT CAO TTA TAT OCT GAT OAT TO 4633 

150 H L 1 P C D V I. Y X K A 0 L Y A D A D I. i« 

4C34 CTT OCT CTT OOA ACT ACA AAT ATA TCT ATC COT TTT TAT AAT OCA TCT lAC OOA »T ATT 4|93 

170I.ALOTTIII«IRFyHASMGYl 189 

4694 TCTTOACACAAOCTOAATTrACTGOOCAAOCTGQOTOTOOGAATTAAAOO^OATTAT 4753 
190fiST0AEFTOQAOSWBX'KP*>^ 

4754 Of^GTTOlTCCAOAAAACOCAOTAOOATTrAOOATATACGCACAOAOAACTO^ 4813 

210VWVPBIIAV6FTIYAQRTA0A m 
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4B14 GOC CM. GOT OOC ATS AGA AAT TTA AGC TTT TCT GM OTA TCA AOA AAT OQC OQC ATF ICS 4873 

230 GQCGHANLSPSB'VSKMOGZS 249 

4874 AAA OCT OCT CAA TTT OOC OTC AAT GOT ATT OCT OTT AA.T TAT ATC TGC OAA TOC OCT TCA 4933 

250 KPAePCVNOIltVNyZCBSA5 249 

4934 OCT COO OAT ATA ATS OTA OTP CCT ACO CAA OCA TOO TCT AAA ACT GOT AAA OIO TTT GOC 4993 

370»FPZHVI.PTOASSKTGKVrG 289 

4994 CAA GAA TTT AGA GAA OTT TAA A1TOA00QAC( X TIC W TT LXllVmt.vm TAAAl»C^^ 5046 

290 (2 E P R B V * 296 

5047 GOGOCATACA ATO OCT GATTTAAAACRAGGTTCAACAACTGaAOOCTCTOTCATrTOOCAT 5127 

1 MADLKVGSTTCOSVIWH 17 

S128 CAA OOA AAT TTT CCA TPS AAT CCA OCC GOT GAC OAT OTA CTC TAT AAA TCA TTT AAA ATA 5187 

18 0GWPPI.NPACDDVI.yXSPXl 37 

5188 TAT TCA GAA TAT AAC AAA CCA CAA OCT OCT OAT AAC OAT TIC GIT TCT AAA OCT AAT GGT 5247 

38y5BYNKPQAADNDFVSXANO 57 

5248 GOT ACT TAT OCA TCA AAO OTA ACA TTT AAC OCT GOC ATT CAA OTC OCA tAT OCT CCA AAC 5307 

SOOTYASKVTPMAOi eVPYAPH 77 

5308 ATC ATB AOC CCA TQC 000 ATT 1AT 000 GOT AAC GOT GAT OQT OCT ACT TTT OAT AAA OCA 5367 

78lNSPCGZyG6NGDOATPDKA 97 

5348 AAT ATC GAT ATT GTT TCA TOG TAT OOC OTA OCA TTT AAA TOO TCA TTT OOT TCA ACA OOC 5427 

98IIIDI-VSWYGVOPKSSFOSTG 117 

5428 COA ACT OTT OTA ATT AAT ACA CGC AAT GGT CAT ATT AAC ACA AAA OOT OTT OTO TOO OCA 5487 

118RTVVINTRNODINTKCVVSA 137 

5188 OCT OOT CAA GTA AOA AGT GOT GOO CCT OCT CCT ATA OCA COO AAT OAC CTT ACT AOA AAO 5547 

138AO0VR80AAAP1AAHDLTRK 157 

5548 GAC TAT CTT OAT GGA GCA ATA AAT ACT CTT ACT OCA AAT OCA AAC TCT AGO OTO CTA COO 5607 

158DYVDCAINTVTANAHSRVLR 177 

5608 TCT OOT GAC ACC ATO ACA GOT AAT TTA ACA OCG CCA AAC TTT TTC TOO CAO AAT CCT OCA S667 

176S6DTMTGHLTAPHFFS0HPA 197 

5668 TCT CAA CCC TCA CAC GTT CCA COA TTT GAC CAA ATC GTA ATT AAG OAT TCT OTP CAA GAT 5727 

198SQFSHVPRPDQIVIKDSV0D 217 

5728 TTC GOC TAT TAT TAA GAOOACTT ATO CCT ACTTTAAAACAAATACAATTPAAAAOAAOCAAA 5789 
218PCYY* HhVhKQJQFKRSKli 

5790 ATC GCA GGA ACA OOT CCT OCT GCTTCAGTATTAOCCCAAOOTGAATTBOCTATAAACTIA S8I9 

14IACTRPAASVI.AECBLAlHt 33 

5850 AAA GAT AOA ACA ATT TIT ACT AAA GAT CAT TCA OOA AAT ATC ATC CAT CTA GOT TTT OCT 5909 

34K»RTIFTK»DSCN1IDLCPA 53 

5910 AAA GOC OOO CAA OTT GAT OOC AAC CTT ACT ATT AAC OOA CTT TTO AGA TTA AAT OOC GAT 5969 

54KOG0VDCKVTIMGLI.RLMCD 73 

5970 TAT CTA CAA ACA GGT OOA ATO ACT OTA AAC OOA CCC ATT GOT TCT ACT OAT OOC OTC ACT 6029 

74yVQTOCMTVWO?IOSTDOVT 93 

6030 GGA AAA ATT TIC AGA TCP ACA CAO OCT ICA TIT TAT OCA ACA OCA ACA AAC GAT ACT TCA 6089 

•4CKJFII»T0GSFYA.1IAT»DTS IW 

S09« ANT OOC CAT Tth TOO «T GAA AAT OCC GAT OOC ACT GAA OOT GOC OTT ATA TAT OCT OOC 6149 

lUHAKLWPBIIADOTBRGVZYAR 133 

<150 OCT CAA ACT ACA ACT OAC GOT GAA ATA OOC CTT AOO OTP AOA CAA OOA ACA GGA AOC ACT 6209 

134POTTTPOBIRI.RVROCTOST 153 

6210 OOC AAC ACT GAA TTC TAT TTC OOC TCT ATA AAT OOA OOC GAA TTT CAO OCT AAC COT ATT 6269 

154AHSBrYFRSlMGGEFQAMRl 173 

6270 TTA OCA TCA GAT TOO TTA OTA ACA AAA CGC ATT OCO OTT OAT ACC OTT ATP CAT OAT OCC 6329 

174LASD SLVTKRIAVDTVIHDA 193 

6330 AAA OCA TTT OOA CAA TAT OAT TCT CAC TCT TTO CTT AAT TAT OTT TAT CCT GGA ACC GOT 6389 

194KAPGQYDSHSI.ViiyVyPGTO 233 

6390 GAA ACA AAT GGT OTA AAC TAT CTT COT AAA GTT OOC OCT AAO TOC GOT GOT ACA ATT TAT 6449 

2l4BTHOVHYLRKVRArS0CTiy 233 

6450 CAT GAA ATT GTT ACT OCA CAA ACA OOC CTG OCT GAT GAA CTT TCT TOO TDC TCP OCT CAT 6509 

a34KEIVTA0T .GLADBV8WtlSGD 253 
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(SIO M» CCJi OTA TIT iUA CW «C OCT ATT <OT OAC GAT OCX: AO^ S5f * 

6S70 C1TOCATT9iOQTACATlCACTACAAATT1CO0O TCT.AOT OAT TAT OOC AAC OTC GOT OTA «29 

274 LALOTPTTllFPSSDYOirVOV 2S3 

6630 ATO OOC OAT AAO TiiT CIT OTT CTCa0CGM;ACTO3AACT00CTV0 tCA TftC AA^ AM ACT J6M 

294 MOOKrtVLCt)TVT01.STKKT 3U 

«6»0 OOT OTA ITT OAT ClIA OIT OOC OCT OQA TAT TCT OIT OCT TCT AlT ACT CCT Q»C MT TIC 6749 

aUCVrDLVGOOYSVASlTPDSF 333 

67S0 OCT ACT ACT COT AAA OOT ATA TTT OOT CCT TCT GAO GAC CM OOC OCA ACT TOO ATA ATQ 6809 

334RSTRKGIFG»SBDOOATI#XM 353 

6810 CCT OOT ACA AAT OCT OCT CTC TlO TCT OTP CAA ACA CAA OCT OAT AAT AAC AAT OCT OCSA 6069 

354 PGTNAAX.L8VQTQADNIIliAQ 373 

6870 GKC OQA CAA AOC CAT ATC 000 TAC AAT OCT OOC OGT AAA ATO AAC GAC TAT TTC GOT OOT 6929 

374 0GQTKIOYHAOOK1INBYFRO 393 

6930 AO^ OOT CAO ATO AAT ATC AAT ACC CM CAA OOT ATO GAA ATT AAC COO OOT ATT TW AAA 6989 

394 TGQIfNIKTaOOM£tlfPOXLK 413 

6990 TlO OTA ACT OOC TCT AAT AAT OTA CM TTT TAC OCT GAC QOh ACT ATT TCT TOC AIT CM 7049 

414X*VT0 8irilVQFyADOTXSSXQ 433 

7050 CCT ATT AAA TTA OAT AAC GAO A1A TTT TEA ACT AM TCT MT MT ACT OOO GOT CW AM 7109 

434 PXKX.DtlEIPX.Tlt8|lirTAO[pK 453 

7110 TTT OGA CCT CCT AOC CM GTT QATOOCACAAOOACTATCCMTQaAAC OOT OOT ACT OOC 7169 

454 FOAPSOVD OTATXOWMOOTR 473 

7170 GM OGA CAO MT AM MC TOT OTO ATT ATT AM CCh TOO OOT AAC TCA TTT MT OOC ACT 7229 

474 ECQNKHyVIIKAWGMSFllAT 493 

7230 OGT OAT A0A TCT OOC GM AtO OTP TPC CM CTA TCA GAT ACT CM OOA TAT TAT TTT TAT 7289 

494 CDRSRETVFOVSDSQOirypr 513 

7290 CCT CAT OGT AM OCT CCA ACC OOC GAC GM ACT ATT OGA CCT ATT GM OCT CM TTT GCT 7349 

514ABRKAPTODBTIGRIEAQFA 533 

7350 OOO GAT GTT TAT OCT AM GOT ATT ATT GCC MC OGA MT TTT AGA GTT GTT GOO TCA AOC 7409 

534 GDVYAKGIIANGHPRVVGSS 553 

7410 GCT TTA GCC OOC MT GTT ACT ATO TCT MC OCT TTG TTT GTC CM GOT OGT TCT TCT ATP 7469 

554 ALAOHVTMSHGLrVOOOS«X 573 

7470 ACT OCACMOTTAMATTOaCOGAACACCAAACOCACTOAGAATTTOOMCOCTOMTAT 7529 

574 TO0VKX66TANAi:.RXIfNAiy 593 

7530 GOT OCT ATT TIC OGT COT TCO GM AGT MC TTT TAT ATT ATP CCA AOC MT CM MT OM^ 7589 

SMCAirRRSESMFYIXrTIIOItB «13 

7590 OGACMAGTOCIAGACATTCACAOCTeTTIOAfiACCTGTQAOAAl^OOATTAAACOATOOC 7649 

614GESCDIB5SLRPVRIGLMDO 633 

7650 AtO OTP OOO TTA OOA AGR CAT TCT TIT ATA GTA GAT CAA MT MT OCT VIA ACT AOO ATA 7709 

634 MVOX.GRDSF1VOOHKAX*TTI 653 

7710 AAC ACT AAC TCT COCATPMTOCCAACTTTAOAATOCMTIOOOOCWTOOOCATAC ATT 7769 

6S4NSNSRXNAMFRKQI»OQSAYX 673 

7770 GAT CCA OM TCT ACT OCT OTP OOC COO OOO OOP OCA OGT ICA TIT OCT TCC CAO MT 7829 
674 DAECT DAVR P"^AGAG»PAS 01» 

7830 AAT CM OAC CIC a» OOO COO TTC T*T ATO MT AT? GAT AC» ACT OAT OCT AGT OCA TAT 788* 

694 M BDVRAPF-YM)! IDRTOASA Y 713 

7890 GTTCCTATTTIBAMCM00TTATGTTCMO0CMTG0CT8CTAT TCA TIA GOO ACTJTA 7949 

714VPILKORTVOOHOCY«itOtX» 733 

7950 AIT MT MT OOT MT ncOGAOTrCATlACCATGOCGOCGGA GAT AAC OOP TCP ACA OOT 8009 

734 XNNG1IFRVHYHGGGDMOSTO 753 

8010 OCA CAO ACT OCT CSAT TTP OOA TOO GM TTP ATT AM AAC GOT GAT TTT ATT fCA OCT CXX; 8069 

754 P0TADFOltEPlKH ODriSPIl 773 

8070 GAT TTA ATA OCA OOC AM GTC AGA TTT GAT AGA ACT OOT MT ATC ACT OOT GOT TCT OOT 8129 

774 DI.IAOKVRPDRTOK1TOOSO 793 
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